Camue
Camue

Reputation: 481

Classification: Target with more than 2 classes

I am doing a classification exercise and facing a target with more than 2 categorical classes. I have encoded those classes using the Labelencoder. The only problem is, I believe I might have to use Onehotencoding after as I do not have only zero and 1 anymore but 0,1,2,3. The reality is, I just do not know if the Knn or the Decision T. would accept those number as classes. If No, can someone tell me what to do?

Here my first target:

Good 
bad
medium
excellent

I changed it to the following:

1
3
2
0

I guess my real question is can this be directly used as classes for my target or I need further engineering?

Upvotes: 0

Views: 1401

Answers (1)

Sreeram TP
Sreeram TP

Reputation: 11917

Most of the models in Sklearn supports multiclass without onehot encoding. KNN and DecisionTree also support it.

Let's use a toy example to verify this,

from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.neighbors import KNeighborsClassifier

f = [[1, 2], [3.2, 4.5], [2.0, 0.75], [0.25, 3.68]]

t = [1,
3,
2,
0]

lr = LogisticRegression().fit(f, t)
d = DecisionTreeClassifier().fit(f, t)
r = RandomForestClassifier().fit(f, t)
n = KNeighborsClassifier(n_neighbors=3).fit(f, t)

lr.predict(f) # array([3, 3, 2, 0])
d.predict(f) # array([3, 3, 2, 0])
r.predict(f) # array([3, 3, 2, 0])
n.predict(f) # array([0, 0, 0, 0])

As you can see, all of them support multiclass without any one hot encoding.

If you want to use a Nerual Net, then you may want to one hot encode the labels based on the loss function you use.

Upvotes: 1

Related Questions