Change the column names after applying One Hot Encoding

Question

I am using OneHotEncoder to create a matrix based on below dataframe.

CA    Mode       Avenue   
100   Cash       CC
101   Cheque     CC
103   Cash       DF
104   Digital    DF

So When I use OneHotEncoder the resultant dataframe should look like

CA     X0_Cash    x0_Cheque    X0_Digital   X1_CC    X1_DF
100      1           0              0          1       0
101      0           1              0          1       0
102      1           0              0          0       1
104      0           0              1          0       1

As we can see the column names have been changed. I want the column names as Avenue_Cash, Avenue_Cheque i.e. first the original column name followed by an 'underscore' and then row value.

How to achieve the same? Any clue?

meenaparam · Accepted Answer

You can use the get_feature_names method to set the column names as you'd like.

import numpy as np
import pandas as pd
from sklearn.preprocessing import OneHotEncoder

# set up the data
X = [[100, "Cash", "CC"], [101, "Cheque", "CC"], [103, "Cash", "DF"], [104, "Digital", "DF"]]

# set up the OneHotEncoder with sparse=False
enc = OneHotEncoder(handle_unknown='ignore', sparse=False)

# fit the OneHotEncoder
enc.fit(X)

# define your column names
colnames = enc.get_feature_names(["CA", "Mode", "Avenue"])

# make your transformed dataframe
X_new =  pd.DataFrame(enc.fit_transform(X), columns = colnames)

X_new
   CA_100  CA_101  CA_103  ...  Mode_Digital  Avenue_CC  Avenue_DF
0     1.0     0.0     0.0  ...           0.0        1.0        0.0
1     0.0     1.0     0.0  ...           0.0        1.0        0.0
2     0.0     0.0     1.0  ...           0.0        0.0        1.0
3     0.0     0.0     0.0  ...           1.0        0.0        1.0

Change the column names after applying One Hot Encoding

Answers (2)

Related Questions