Scoring increasing with number of components using PCA

Question

I recently started working in the field of machine learning and stuff related to it using python. Today I'm working on a dataset where I would like to apply a dimension reduction and apply my model to evaluate the score. This dataset got 30 features.

I start with a simple algorithm which is the Logistic Regression but before applying my logistic regression I want to do a PCA. To determine which number of components is the best I used the gridsearchCV with my logistic regression only playing with the C parameter and my PCA where I choose the number of components. The result I got is that the more components I use for my PCA the better is the precision score. For my example with n_components=30 I get a precision score of 0.81.

The problem is that I thought PCA is used for dimension reduction (i.e working with fewer features) and that it could help increasing score. Is there something I do not understand?

pca = PCA()
logistic = LogisticRegression()
pipe = Pipeline(steps=[('pca', pca), ('logistic', logistic)])

param_grid = {
    'pca__n_components': [5,10,15,20,25,30],
    'logistic__C': [0.01,0.1,1,10,100]
}
search = GridSearchCV(pipe, param_grid, cv=5, n_jobs=-1, scoring='precision') # fix adding a tuple scoring 
search.fit(X_train, y_train)
print("Best parameter (CV score=%0.3f):" % search.best_score_)
print(search.best_params_)
results = pd.DataFrame(search.cv_results_)

output : Best parameter (CV score=0.881): {'logistic__C': 0.01, 'pca__n_components': 30}

Thanks in advance for your reply

EDIT: I add this screenshot for more information on the score with number of components

Scoring increasing with number of components using PCA

Answers (1)

Related Questions