Reputation: 51
I want to use inertia_ which is attribute in [K-means] : https://scikit-learn.org/stable/modules/classes.html#module-sklearn.cluster in gridSearchCV's score function.
I tried to define custom function using [make_scorer] :https://scikit-learn.org/stable/modules/generated/sklearn.metrics.make_scorer.html#sklearn.metrics.make_scorer
But the problem I am facing is "You can not use inertia_ attribute of k-means in my_scorer because at the time of the execution of my_scorer function, the clustering algorithm isn't fit yet"
kmeans = KMeans(n_jobs=-1)
grid_param = {'n_clusters' : [2, 5, 8, 14, 20, 25, 30]}
def custom_scoring(fit_obj) :
return fit_obj.inertia_
gd_sr = GridSearchCV(estimator=kmeans,
param_grid=grid_param,
scoring=metrics.make_scorer(custom_scoring,
greater_is_better = False),
n_jobs=-1)
Upvotes: 2
Views: 5027
Reputation: 176
Before getting into possibility of making a scorer from inertia, I would advise you to ponder if it is a good idea to do so. inertia_ is sum of squared distance of samples to its closest cluster centers. Hypothetically, even if you somehow manage to use this as your scorer, you would always end up getting max(n_clusters) as your grid search result. Here's why I believe this would happen.
If you would plot this inertia in y-axis for different cluster sizes, on the left end of the graph, the inertia_ value would be equal to variance as you will have cluster centre as the mean of data and the SSE of all samples with this mean would be f(variance). On the right hand side, if you have as many clusters as the number of samples, you will get inertia_ = 0. This value would be monotonically decreasing function as we increase the cluster. The best grid parameter with this scoring would always be max number of clusters. I dont see this to be very useful. Please let me know if I am missing something.
Upvotes: 1
Reputation: 16640
As the error message says, you have to first run fit()
function on your KMeans
object (fit it to the data) before using it as an estimator in GridSearchCV
. Please refer to example from the documentation to get an idea.
Upvotes: 1