Gerard
Gerard

Reputation: 518

GridsearchCV to find the optimum parameter for BIRCH

I am using gridsearchCV to find the optimum parameters for BIRCH, my code is:

RAND_STATE=50  # for reproducibility and consistency
folds=3
k_fold = KFold(n_splits=folds, shuffle=True, random_state=RAND_STATE)

hyperparams = { "branching_factor": [50,100,200,300,400,500,600,700,800,900],
                "n_clusters": [5,7,9,11,13,17,21],
                "threshold": [0.2,0.3,0.4,0.5,0.6,0.7]}
birch = Birch()

def sil_score(ndata):
    labels = ensemble.predict(ndata)
    score = silhouette_score(ndata, labels)
    return score

sil_scorer = make_scorer(sil_score)

ensemble = GridSearchCV(estimator=birch,param_grid=hyperparams,scoring=sil_scorer,cv=k_fold,verbose=10,n_jobs=-1)

ensemble.fit(x)
print ensemble
best_parameters = ensemble.best_params_
print best_parameters
best_score = ensemble.best_score_
print best_score

however the output gives me an error:

enter image description here

I am confused why the score value is looking for 4 arguments when the I already stated the required parameters needed for scoring in the sil_score function.

Upvotes: 0

Views: 1194

Answers (1)

Gambit1614
Gambit1614

Reputation: 8801

Your scoring function is incorrect. The syntax should be sil_score(y_true,y_pred) where y_true are the ground truth lables and y_pred are the predicted labels. Also you need not separately predict the labels using the ensemble object inside your scoring function. Also in your case it makes more sense to directly use silhouette_score as the scoring function since you are calling your ensemble to predict labels inside the scoring function which is not required at all. Just pass the silhouette_score as the scoring function and GridSearchCV will take care of predicting the scoring on it's own.

Here is an example if you want to see how it works.

Upvotes: 2

Related Questions