Sklearn - GridSearchCV with v_measure_score is NOT the same

Question

I am trying to use GridSearchCV with v_measure_score and compare the result
with another method WITHOUT GridSearchCV.

The best score of v_measure_score by for-loop is 0.69816019299 with percentile 27;
the best score of GridSearchCV is 0.565562627046 with percentile 12.

In my opinion, the results should be the same.
I've checked my code several times but still cannot figure out the reason. The following is my code:

GridSearchCV

estimators = [('tfIdf', TfidfTransformer()), ('sPT', SelectPercentile()), ('kmeans', cluster.KMeans())]
pipe = Pipeline(estimators)
params = dict(tfIdf__smooth_idf=[True],
              sPT__score_func= [f_classif], sPT__percentile=range(100, 0, -1),
              kmeans__n_clusters=[clusterNum], kmeans__random_state=[0], kmeans__precompute_distances=[True])
v_measure_scorer = make_scorer(v_measure_score)
grid_search = GridSearchCV(pipe, param_grid=params, scoring=v_measure_scorer)
grid_search_fit = grid_search.fit(apiVectorArray, yTarget)

v_measure_score by for-loop

bestPercent = [-1, -1]
for percent in xrange(100, 0, -1):
    transformer = TfidfTransformer(smooth_idf=True)
    apiVectorArrayTFIDF = transformer.fit_transform(apiVectorArray)
    apiVectorFit = SelectPercentile(f_classif, percentile=percent).fit(apiVectorArrayTFIDF, yTarget)
    k_means = cluster.KMeans(n_clusters=clusterNum, random_state=0, precompute_distances=True).fit(apiVectorFit.transform(apiVectorArrayTFIDF))

    if v_measure_score(yTarget, k_means.labels_) > bestPercent[1]:
        bestPercent[0] = percent
        bestPercent[1] = v_measure_score(yTarget, k_means.labels_)

I tried to add color on my code but failed.
Sorry for your eyes.

Thanks.

Sklearn - GridSearchCV with v_measure_score is NOT the same

Answers (1)

Related Questions