Reputation: 535
I've been following a tutorial trying to understand machine learning while trying out what he's doing at the same time.
My array is:
0 44 72000
2 27 48000
1 30 54000
2 38 61000
1 40 6.377777777777778101
0 35 58000
2 38.77777777777777857 52000
0 48 79000
1 50 83000
0 37 67000
The first column used to contain country name but he used label encoder to transform it to 0s,1s and 2s.
He wanted to also use OneHotEncoder to transform that column to more features but since his videos are a bit outdated he used categorical_features with OneHotEncoder but in my sklearn version OneHotEncoder has been changed and i don't have that parameter anymore.
So how can I use OneHotEncoder now on that specific feature?
What he tried was:
onehotencoder = OneHotEncoder(categorical_features = [0])
X = onehotencoder.fit_transform(X).toarray()
Upvotes: 0
Views: 2072
Reputation: 155
one hot encoding based on categories. You can represent your data with one hot vectors. For instance if you have 2 classes your vector have length 2:
[_,_]
So each class can be represented in here by just using 0s and 1s. Represented class index take 1 and others take 0. For instance class0 will be:
[1,0]
Class1 will be:
[0,1]
In your example, you have 3 classes. Therefore your one-hot-vector will have length of 3. Each class represented like that:
Class0 -> [1,0,0]
Class1 -> [0,1,0]
Class2 -> [0,0,1]
Then your array will looks like:
[1,0,0] 44 72000
[0,0,1] 27 48000
[0,1,0] 30 54000
[0,0,1] 38 61000
[0,1,0] 40 6.377777777777778101
[1,0,0] 35 58000
[0,0,1] 38.77777777777777857 52000
[1,0,0] 48 79000
[0,1,0] 50 83000
[1,0,0] 37 67000
I hope this clarify your question. You can write your own function to do that.
Upvotes: 0
Reputation: 34
Assuming that your data X has a shape (n_rows, features). If you like to apply one-hot encoding to say, the first column. A quick approach would be
onehotencoder = OneHotEncoder()
one_hot = onehotencoder.fit_transform(X[:,0:1]).toarray()
A better approach to apply one-hot encoding only a specific column would be to use ColumnTransformer
from sklearn.compose import ColumnTransformer
ct = ColumnTransformer([("country", OneHotEncoder(), [0])], remainder = 'passthrough')
X = ct.fit_transform(X)
Upvotes: 1