i'mgnome
i'mgnome

Reputation: 535

How can i apply onehotencoder to one column of an array?

I've been following a tutorial trying to understand machine learning while trying out what he's doing at the same time.

My array is:

0   44                      72000
2   27                      48000
1   30                      54000
2   38                      61000
1   40                      6.377777777777778101
0   35                      58000
2   38.77777777777777857    52000
0   48                      79000
1   50                      83000
0   37                      67000

The first column used to contain country name but he used label encoder to transform it to 0s,1s and 2s.

He wanted to also use OneHotEncoder to transform that column to more features but since his videos are a bit outdated he used categorical_features with OneHotEncoder but in my sklearn version OneHotEncoder has been changed and i don't have that parameter anymore.

So how can I use OneHotEncoder now on that specific feature?

What he tried was:

onehotencoder = OneHotEncoder(categorical_features = [0])
X = onehotencoder.fit_transform(X).toarray()

Upvotes: 0

Views: 2072

Answers (2)

Angerato
Angerato

Reputation: 155

one hot encoding based on categories. You can represent your data with one hot vectors. For instance if you have 2 classes your vector have length 2:

[_,_]

So each class can be represented in here by just using 0s and 1s. Represented class index take 1 and others take 0. For instance class0 will be:

[1,0]

Class1 will be:

[0,1]

In your example, you have 3 classes. Therefore your one-hot-vector will have length of 3. Each class represented like that:

Class0 -> [1,0,0]
Class1 -> [0,1,0]
Class2 -> [0,0,1]

Then your array will looks like:

[1,0,0]   44                      72000
[0,0,1]   27                      48000
[0,1,0]   30                      54000
[0,0,1]   38                      61000
[0,1,0]   40                      6.377777777777778101
[1,0,0]   35                      58000
[0,0,1]   38.77777777777777857    52000
[1,0,0]   48                      79000
[0,1,0]   50                      83000
[1,0,0]   37                      67000

I hope this clarify your question. You can write your own function to do that.

Upvotes: 0

insomaniac79
insomaniac79

Reputation: 34

Assuming that your data X has a shape (n_rows, features). If you like to apply one-hot encoding to say, the first column. A quick approach would be

onehotencoder = OneHotEncoder()
one_hot = onehotencoder.fit_transform(X[:,0:1]).toarray()

A better approach to apply one-hot encoding only a specific column would be to use ColumnTransformer

from sklearn.compose import ColumnTransformer

ct = ColumnTransformer([("country", OneHotEncoder(), [0])], remainder = 'passthrough')
X = ct.fit_transform(X)

Upvotes: 1

Related Questions