Reputation: 329
I'm currently doing a project on my own. For this project i tried to compare results of multiple algorithms. But i want to be sure that every algorithm tested is configured to give the best results.
So i use cross validation and to test every combination of parameters and choose the best.
For example :
def KMeanstest(param_grid, n_jobs):
estimator = KMeans()
cv = ShuffleSplit(n_splits=10, test_size=0.2, random_state=42)
regressor = GridSearchCV(estimator=estimator, cv=cv, param_grid=param_grid, n_jobs=n_jobs)
regressor.fit(X_train, y_train)
print("Best Estimator learned through GridSearch")
print(regressor.best_estimator_)
return cv, regressor.best_estimator_
param_grid={'n_clusters': [2],
'init': ['k-means++', 'random'],
'max_iter': [100, 200, 300, 400, 500],
'n_init': [8, 9, 10, 11, 12, 13, 14, 15, 16],
'tol': [1e-1, 1e-2, 1e-3, 1e-4, 1e-5, 1e-6],
'precompute_distances': ['auto', True, False],
'random_state': [42],
'copy_x': [True, False],
'n_jobs': [-1],
'algorithm': ['auto', 'full', 'elkan']
}
n_jobs=-1
cv,best_est=KMeanstest(param_grid, n_jobs)
But this is very time comsuming. I want to know if this method is the best or if i need to use a different approach.
Thank you for your help
Upvotes: 1
Views: 2295
Reputation: 578
The problem with GridSearch is that it is very time-consuming as you have rightly said. RandomSearch can be a good option sometimes, but it is not optimal.
Bayesian Optimization is another option. this allows us to rapidly zone in on the optimal parameter set using a probabilistic approach. I have tried it personally using the hyperopt library in python and it works really well. Check out this tutorial for more information. You can also download the associated notebook from my GitHub
The good thing is that since you have already experimented with GridSearch, you have a rough idea of which parameter ranges do not work well. So you can define a more accurate search space for the Bayesian Optimization to run on, and this will reduce the time even more. Also, hyperopt can be used to compare multiple algorithms and their respective parameters.
Upvotes: 3
Reputation: 4893
besides Random Search and Grid Search, there are tools and libraries for more intelligent hyperparameters tuning. I used Optuna successfully but there are few more out there.
Upvotes: 3
Reputation: 2918
You can try Random Search in place of Grid Search, Random search is a technique where random combinations of the hyperparameters are used to find the best solution for the built model. It tries random combinations of a range of values. To optimize with random search, the function is evaluated at some number of random configurations in the parameter space.
You can find the details on sklearn documentation page. A comparison is given between Random and Grid Search.
I hope you find this useful.
Upvotes: 1