Using pipeline, SMOTE, and GridSearchCV together

Question

I write this code:

LR=LogisticRegression()

pipe_lr= Pipeline ([
    ('oversampling', SMOTE()),
    ('LR', LR)
])

C_list_lr=[0.001, 0.01, 0.1, 1, 10, 100 ]
solver_list_lr=[ 'liblinear', 'newton-cg', 'saga']
penalty_list_lr=[None, 'elasticnet', 'l1', 'l2']
max_iter_list_lr=[100, 1000, 3000]
random_state_list_lr=[None, 20, 42 ]
param_grid_lr = {
    'LR__C': C_list_lr, 
    'LR__solver': solver_list_lr,
    'LR__penalty': penalty_list_lr,
    'LR__max_iter': max_iter_list_lr,
    'LR__random_state': random_state_list_lr
}

grid_lr = GridSearchCV(pipe_lr, param_grid_lr, cv=5, scoring='accuracy', return_train_score=False)
grid_lr.fit(x1_train, y1_train)

I have two questions:

Is the code correct?
Is it normal to obtain a lower accuracy score working in this way than simply using LogisticRegression with parameters chosen by myself and without oversampling?

I work with a set containing 4024 data. It is a binary classification problem and I have ~3400 examples in one class and just 624 in the second one. When I implemented the same algorithm on the dataset without any over/under-sampling, I received 0.89, but after oversampling and GridSearchCV just 0.83

Using pipeline, SMOTE, and GridSearchCV together

Answers (1)

Related Questions