Reputation: 43
I am trying to classify multi class data using scikit learn logistic regression. I encoded the class using one hot encoder. But when I try to fir the same, I get bad input shape error. Is it possible to use one hot encoded value in sklearn logistic regression ?
from sklearn.preprocessing import OneHotEncoder
import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.feature_extraction.text import CountVectorizer
onehot_encoder = OneHotEncoder(sparse=False)
y = np.array(y)
ok = onehot_encoder.fit_transform(y.reshape(len(y),1))
vectorizer = CountVectorizer()
X = vectorizer.fit_transform(df2["order_description"])
LogisticRegression().fit(X,ok)
Input: Y - "[0,0,1,0]"
Upvotes: 0
Views: 587
Reputation: 36619
If your problem is multi-class, then dont use one-hot encoded form. Scikit-learn is able to handle binary and multi-class labels on its own without any preprocessing from user. So just do this:
clf = LogisticRegression().fit(X,y)
And a one-hot encoded vector for labels has a different meaning in scikit-learn. It is interpreted as a label-indicator matrix which turns on multi-label (where more than one labels can be target, for example movie genre prediction) and not multi-class.
Upvotes: 1