Udhai kumar
Udhai kumar

Reputation: 43

how to fit one hot encoded class in scitkit model

I am trying to classify multi class data using scikit learn logistic regression. I encoded the class using one hot encoder. But when I try to fir the same, I get bad input shape error. Is it possible to use one hot encoded value in sklearn logistic regression ?

from sklearn.preprocessing import OneHotEncoder
import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.feature_extraction.text import CountVectorizer
onehot_encoder = OneHotEncoder(sparse=False)
y = np.array(y)
ok = onehot_encoder.fit_transform(y.reshape(len(y),1))
vectorizer = CountVectorizer()
X = vectorizer.fit_transform(df2["order_description"])
LogisticRegression().fit(X,ok)

Input: Y - "[0,0,1,0]"

Upvotes: 0

Views: 587

Answers (1)

Vivek Kumar
Vivek Kumar

Reputation: 36619

If your problem is multi-class, then dont use one-hot encoded form. Scikit-learn is able to handle binary and multi-class labels on its own without any preprocessing from user. So just do this:

clf = LogisticRegression().fit(X,y)

And a one-hot encoded vector for labels has a different meaning in scikit-learn. It is interpreted as a label-indicator matrix which turns on multi-label (where more than one labels can be target, for example movie genre prediction) and not multi-class.

Upvotes: 1

Related Questions