Reputation: 11
I'm trying to learn machine learning.
I had a doubt about one hot encoding:
I have a data set split into 2 excel sheets of data. One sheet has train and other has test data. I first trained my model by importing the train data sheet with pandas. There are categorical features in the data set that have to be encoded. I one hot encoded them.
After importing the test dataset , if I one hot encode it, will the encoding be the same as of the train data set or will it be different. If so, how can I solve this issue?
Upvotes: 1
Views: 2116
Reputation: 527
you have 2 seperate sheets ( for test and train data set). you have to one-hot encode both the sheets seperately after importing it into the pandas data frame.
and YES one hot encoding will be the same for the same data set no matter you apply on different data sheets, make sure you have same categorical values in that column in each of your data sheet
Upvotes: 0
Reputation: 837
OneHot Encoding creates binary attribute per category or per value, one attribute equal to 1 ( and o otherwise). One Attribute equal to 1 (hot), while the others will be 0 (cold).
sample example:-
from sklearn.preprocessing import OneHotEncoder
encoder = OneHotEncoder()
1hot = encoder.fit_transform(df_object.reshape(-1,1))
1hot
sample output:-
array([[0., 0., 0., 1., 0.],
[0., 0., 0., 1., 0.],
[0., 0., 0., 1., 0.],
...,
[0., 1., 0., 0., 0.],
[0., 1., 0., 0., 0.],
[0., 1., 0., 0., 0.]])
you need to check if an attribute which you are fitting in oneHotEncoding are relatively closeby values or not.
Upvotes: 1