Ammar Aldulaimi
Ammar Aldulaimi

Reputation: 21

what is the best way to keep columns names after doing OneHotEncoder in python?

What is the best way to keep column names after doing one hot encoder in python? All my features are categorical so I did like below: so, after import the dataset it looks like below

 PlaceID       Date  ...  BlockedRet  OverallSeverity
0    23620  1/10/2019  ...           1                1
1    13352  1/10/2019  ...           1                1
2    13674  1/10/2019  ...           1                1
3    13501  1/10/2019  ...           1                1
4    13675  1/10/2019  ...           1                1

[5 rows x 28 columns]

after choosing the features, I want to transform them using one hot encoder because most of them are categorical, my question after doing that using:

from sklearn.preprocessing import LabelEncoder, OneHotEncoder

hotencode = OneHotEncoder(categorical_features=[0])
features = hotencode.fit_transform(features).toarray()

enter image description here the result comes without original column names, how can I transform them with the same column name+0.,1,2,3.

Upvotes: 2

Views: 2563

Answers (1)

Suhas_Pote
Suhas_Pote

Reputation: 4590

Here is a simple example:

import pandas as pd

df = pd.DataFrame([
       ['green', 'Chevrolet', 2017],
       ['blue', 'BMW', 2015], 
       ['yellow', 'Lexus', 2018],
])
df.columns = ['color', 'make', 'year']

df

'''
    color       make  year  color_encoded  Color_0  Color_1  Color_2
0   green  Chevrolet  2017              1      0.0      1.0      0.0
1    blue        BMW  2015              0      1.0      0.0      0.0
2  yellow      Lexus  2018              2      0.0      0.0      1.0
'''

Approach 1: One Hot Encoder

from sklearn.preprocessing import LabelEncoder
le_color = LabelEncoder()
df['color_encoded'] = le_color.fit_transform(df.color)

from sklearn.preprocessing import OneHotEncoder
color_ohe = OneHotEncoder()

X = color_ohe.fit_transform(df.color_encoded.values.reshape(-1,1)).toarray()

dfOneHot = pd.DataFrame(X, columns = ["Color_"+str(int(i)) for i in range(X.shape[1])])
df = pd.concat([df, dfOneHot], axis=1)

df

'''
    color       make  year  color_encoded  Color_0  Color_1  Color_2
0   green  Chevrolet  2017              1      0.0      1.0      0.0
1    blue        BMW  2015              0      1.0      0.0      0.0
2  yellow      Lexus  2018              2      0.0      0.0      1.0
'''

Reference:

https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html

Approach 2: Get Dummies

df_final = pd.concat([df, pd.get_dummies(df["color"],prefix="color")], axis=1)


df_final

'''
    color       make  year  color_blue  color_green  color_yellow
0   green  Chevrolet  2017           0            1             0
1    blue        BMW  2015           1            0             0
2  yellow      Lexus  2018           0            0             1
'''

Reference:

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.get_dummies.html

Upvotes: 6

Related Questions