Vamsi Krishna
Vamsi Krishna

Reputation: 11

One hot encoding in Python

I'm trying to learn machine learning.

I had a doubt about one hot encoding:
I have a data set split into 2 excel sheets of data. One sheet has train and other has test data. I first trained my model by importing the train data sheet with pandas. There are categorical features in the data set that have to be encoded. I one hot encoded them.

After importing the test dataset , if I one hot encode it, will the encoding be the same as of the train data set or will it be different. If so, how can I solve this issue?

Upvotes: 1

Views: 2116

Answers (2)

Arshdeep Singh
Arshdeep Singh

Reputation: 527

you have 2 seperate sheets ( for test and train data set). you have to one-hot encode both the sheets seperately after importing it into the pandas data frame.

and YES one hot encoding will be the same for the same data set no matter you apply on different data sheets, make sure you have same categorical values in that column in each of your data sheet

Upvotes: 0

jits_on_moon
jits_on_moon

Reputation: 837

OneHot Encoding creates binary attribute per category or per value, one attribute equal to 1 ( and o otherwise). One Attribute equal to 1 (hot), while the others will be 0 (cold).

sample example:-

from sklearn.preprocessing import OneHotEncoder
encoder = OneHotEncoder()
1hot = encoder.fit_transform(df_object.reshape(-1,1))
1hot

sample output:-

array([[0., 0., 0., 1., 0.],
       [0., 0., 0., 1., 0.],
       [0., 0., 0., 1., 0.],
       ...,
       [0., 1., 0., 0., 0.],
       [0., 1., 0., 0., 0.],
       [0., 1., 0., 0., 0.]])

you need to check if an attribute which you are fitting in oneHotEncoding are relatively closeby values or not.

Upvotes: 1

Related Questions