sklearn Logistic Regression ValueError: X has 42 features per sample; expecting 1423

Question

I'm stuck trying to fix an issue. Here is what I'm trying to do :

I'd like to predict missing values (Nan) (categorical one) using logistic regression. Here is my code :

df_1 : my dataset with missing values only in the "Metier" feature (missing values I'm trying to predict)

X_train = pd.get_dummies(df_1[df_1['Metier'].notnull()].drop(columns='Metier'),drop_first = True)
X_test = pd.get_dummies(df_1[df_1['Metier'].isnull()].drop(columns='Metier'),drop_first = True,dummy_na = True)

Y_train = df_1[df_1['Metier'].notnull()]['Metier']
Y_test = df_1[df_1['Metier'].isnull()]['Metier']

from sklearn.linear_model import LogisticRegression
classifier = LogisticRegression(random_state = 0)

classifier.fit(X_train, Y_train)

classifier.score(X_train,Y_train) = 0.705112088833019

BUT when I'm trying to get the prediction on Y_test It says :

ValueError: X has 42 features per sample; expecting 1423

I would highly appreciate If someone could give me a hand.

Thanks a lot :)

sklearn Logistic Regression ValueError: X has 42 features per sample; expecting 1423

Answers (1)

Related Questions