Navigating hyper-parameters

Question

I was just wondering if someone could provide a good source for me to read on how I should approach choosing hyper-parameters of the solver based on the complexity of my problem.

Basically, I understand that many feel that they are "shooting around in the dark" when it comes to setting and then modifying these parameters and a system or benchmark for choosing parameters based on specific problem/data complexity has escaped me.

If you care to explain your own methodology or simply provide commentary on your source, it would be much appreciated.

Flavio Ferrara · Accepted Answer

Since the hyperparameters we're talking about are related to backpropagation, which is a gradient-based approach, I believe the main reference is Y. Bengio, along with the more classic Lecun et al..

There are three main approaches to find out the optimal value for an hyperparameter. The first two are well explained in the first paper I linked.

Manual search. The researcher choose the optimal value through try-and-error.
Automatic search. The researcher relies on an automated routine in order to speed up the search.
Bayesian Optimization. You can find a video presenting it here.

Navigating hyper-parameters

Answers (2)

Related Questions