Reputation: 518
I am using gridsearchCV to find the optimum parameters for BIRCH, my code is:
RAND_STATE=50 # for reproducibility and consistency
folds=3
k_fold = KFold(n_splits=folds, shuffle=True, random_state=RAND_STATE)
hyperparams = { "branching_factor": [50,100,200,300,400,500,600,700,800,900],
"n_clusters": [5,7,9,11,13,17,21],
"threshold": [0.2,0.3,0.4,0.5,0.6,0.7]}
birch = Birch()
def sil_score(ndata):
labels = ensemble.predict(ndata)
score = silhouette_score(ndata, labels)
return score
sil_scorer = make_scorer(sil_score)
ensemble = GridSearchCV(estimator=birch,param_grid=hyperparams,scoring=sil_scorer,cv=k_fold,verbose=10,n_jobs=-1)
ensemble.fit(x)
print ensemble
best_parameters = ensemble.best_params_
print best_parameters
best_score = ensemble.best_score_
print best_score
however the output gives me an error:
I am confused why the score value is looking for 4 arguments when the I already stated the required parameters needed for scoring in the sil_score function.
Upvotes: 0
Views: 1194
Reputation: 8801
Your scoring function is incorrect. The syntax should be sil_score(y_true,y_pred)
where y_true are the ground truth lables and y_pred
are the predicted labels. Also you need not separately predict the labels using the ensemble object inside your scoring function. Also in your case it makes more sense to directly use silhouette_score
as the scoring function since you are calling your ensemble to predict labels inside the scoring function which is not required at all. Just pass the silhouette_score
as the scoring function and GridSearchCV will take care of predicting the scoring on it's own.
Here is an example if you want to see how it works.
Upvotes: 2