Reputation: 1962
Sorry if this is a duplicate.
I have a two-class prediction model; it has n
configurable (numeric) parameters. The model can work pretty well if you tune those parameters properly, but the specific values for those parameters are hard to find. I used grid search for that (providing, say, m
values for each parameter). This yields m ^ n
times to learn, and it is very time-consuming even when run in parallel on a machine with 24 cores.
I tried fixing all parameters but one and changing this only one parameter (which yields m × n
times), but it's not obvious for me what to do with the results I got. This is a sample plot of precision (triangles) and recall (dots) for negative (red) and positive (blue) samples:
Simply taking the "winner" values for each parameter obtained this way and combining them doesn't lead to best (or even good) prediction results. I thought about building regression on parameter sets with precision/recall as dependent variable, but I don't think that regression with more than 5 independent variables will be much faster than grid search scenario.
What would you propose to find good parameter values, but with reasonable estimation time? Sorry if this has some obvious (or well-documented) answer.
Upvotes: 3
Views: 276
Reputation: 43477
I would use a randomized grid search (pick random values for each of your parameters in a given range that you deem reasonable and evaluate each such randomly chosen configuration), which you can run for as long as you can afford to. This paper runs some experiments that show this is at least as good as a grid search:
Grid search and manual search are the most widely used strategies for hyper-parameter optimization. This paper shows empirically and theoretically that randomly chosen trials are more efficient for hyper-parameter optimization than trials on a grid. Empirical evidence comes from a comparison with a large previous study that used grid search and manual search to configure neural networks and deep belief networks. Compared with neural networks configured by a pure grid search, we find that random search over the same domain is able to find models that are as good or better within a small fraction of the computation time.
For what it's worth, I have used scikit-learn's random grid search for a problem that required optimizing about 10 hyper-parameters for a text classification task, with very good results in only around 1000 iterations.
Upvotes: 2
Reputation: 76297
I'd suggest the Simplex Algorithm with Simulated Annealing:
Very simple to use. Simply give it n + 1 points, and let it run up to some configurable value (either number of iterations, or convergence).
Implemented in every possible language.
Doesn't require derivatives.
More resilient to local optimum than the method you're currently using.
Upvotes: 1