Department of Computer Science

Technical Report No. 293 - Abstract

Jan N. van Rijn, Frank Hutter
An Empirical Study of Hyperparameter Importance Across Datasets.

With the advent of automated machine learning, automated hyperparameter optimization methods are by now routinely used. How- ever, this progress is not yet matched by equal progress on automatic analyses that yield information beyond performance-optimizing hyper- parameter settings. Various post-hoc analysis techniques exist to analyze hyperparameter importance, but to the best of our knowledge, so far these have only been applied at a very small scale. To fill this gap, we conduct a large scale experiment to discover general trends across 100 datasets. The results in case studies with random forests and Adaboost show that the same hyperparameters typically remain most important across datasets. Overall, these results, obtained fully automatically, provide a quantitative basis to focus efforts in both manual algorithm design and in automated hyperparameter optimization.

Report No. 293 (PDF)