Encipher
Encipher

Reputation: 2866

Applying One hot encoding on a particular column of a dataset but result was not as expected

I have a dataset with five columns.

Dataset:

Country       Population    Tourism    Mean_Age    Employed
Afghanistan  37172386       14000      17.3        Fulltime
Albania      2866376        5340000    36.2        Parttime

There are almost 1000 data like this where Employed is a categorical column. I want to represent the Employed column as a numerical column using one hot encoding.

My code is

from sklearn.preprocessing import OneHotEncoder
Employed_Status = data["Employed"]
encoder = OneHotEncoder()
encoder.fit(Employed_Status.values.reshape(-1, 1))
encoder.transform(Employed_Status.head().values.reshape(-1, 1)).todense()

Here data is the name of my data frame.

When I try to see the dataset after executing above lines I got the previous data set.

However, I thought I would get something like that

Country       Population    Tourism    Mean_Age    Employed
Afghanistan  37172386       14000      17.3        1
Albania      2866376        5340000    36.2        0

As I have applied one hot encoding on Employed column.

Can any one tell me why I got the same result and not the desired one?

Upvotes: 0

Views: 927

Answers (2)

user3252344
user3252344

Reputation: 758

You're not saving the output.

out = encoder.transform(...).todense()

data['employed'] = out

It may take some wrangling to get the datasets to go together. I have found pd.concat(numerical_in, categorical_encoded_in, axis=1) is needed in the past but you might simply find it works once you save the dense output.

Upvotes: 0

Patricio Loncomilla
Patricio Loncomilla

Reputation: 1103

You can do something like this:

data['Employed'] = data['Employed'].replace('Fulltime',1).replace('Parttime',0)

Upvotes: 1

Related Questions