Reputation: 481
I am doing a classification exercise and facing a target with more than 2 categorical classes. I have encoded those classes using the Labelencoder. The only problem is, I believe I might have to use Onehotencoding after as I do not have only zero and 1 anymore but 0,1,2,3. The reality is, I just do not know if the Knn or the Decision T. would accept those number as classes. If No, can someone tell me what to do?
Good
bad
medium
excellent
1
3
2
0
I guess my real question is can this be directly used as classes for my target or I need further engineering?
Upvotes: 0
Views: 1401
Reputation: 11917
Most of the models in Sklearn supports multiclass without onehot encoding. KNN and DecisionTree also support it.
Let's use a toy example to verify this,
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.neighbors import KNeighborsClassifier
f = [[1, 2], [3.2, 4.5], [2.0, 0.75], [0.25, 3.68]]
t = [1,
3,
2,
0]
lr = LogisticRegression().fit(f, t)
d = DecisionTreeClassifier().fit(f, t)
r = RandomForestClassifier().fit(f, t)
n = KNeighborsClassifier(n_neighbors=3).fit(f, t)
lr.predict(f) # array([3, 3, 2, 0])
d.predict(f) # array([3, 3, 2, 0])
r.predict(f) # array([3, 3, 2, 0])
n.predict(f) # array([0, 0, 0, 0])
As you can see, all of them support multiclass without any one hot encoding.
If you want to use a Nerual Net, then you may want to one hot encode the labels based on the loss function you use.
Upvotes: 1