pythondumb
pythondumb

Reputation: 1247

Change the column names after applying One Hot Encoding

I am using OneHotEncoder to create a matrix based on below dataframe.

CA    Mode       Avenue   
100   Cash       CC
101   Cheque     CC
103   Cash       DF
104   Digital    DF

So When I use OneHotEncoder the resultant dataframe should look like

CA     X0_Cash    x0_Cheque    X0_Digital   X1_CC    X1_DF
100      1           0              0          1       0
101      0           1              0          1       0
102      1           0              0          0       1
104      0           0              1          0       1

As we can see the column names have been changed. I want the column names as Avenue_Cash, Avenue_Cheque i.e. first the original column name followed by an 'underscore' and then row value.

How to achieve the same? Any clue?

Upvotes: 0

Views: 1271

Answers (2)

meenaparam
meenaparam

Reputation: 2019

You can use the get_feature_names method to set the column names as you'd like.

import numpy as np
import pandas as pd
from sklearn.preprocessing import OneHotEncoder

# set up the data
X = [[100, "Cash", "CC"], [101, "Cheque", "CC"], [103, "Cash", "DF"], [104, "Digital", "DF"]]

# set up the OneHotEncoder with sparse=False
enc = OneHotEncoder(handle_unknown='ignore', sparse=False)

# fit the OneHotEncoder
enc.fit(X)

# define your column names
colnames = enc.get_feature_names(["CA", "Mode", "Avenue"])

# make your transformed dataframe
X_new =  pd.DataFrame(enc.fit_transform(X), columns = colnames)

X_new
   CA_100  CA_101  CA_103  ...  Mode_Digital  Avenue_CC  Avenue_DF
0     1.0     0.0     0.0  ...           0.0        1.0        0.0
1     0.0     1.0     0.0  ...           0.0        1.0        0.0
2     0.0     0.0     1.0  ...           0.0        0.0        1.0
3     0.0     0.0     0.0  ...           1.0        0.0        1.0

Upvotes: 3

Allen Qin
Allen Qin

Reputation: 19957

You can use pandas.get_dummies which gives the result you want by default:

Setup:

df = pd.DataFrame({'CA': {0: 100, 1: 101, 2: 103, 3: 104},
                 'Mode': {0: 'Cash', 1: 'Cheque', 2: 'Cash', 3: 'Digital'},
                 'Avenue': {0: 'CC', 1: 'CC', 2: 'DF', 3: 'DF'}})

Solution:

pd.get_dummies(df, columns=['Mode', 'Avenue'])

    CA  Mode_Cash   Mode_Cheque Mode_Digital    Avenue_CC   Avenue_DF
0   100 1           0           0               1           0
1   101 0           1           0               1           0
2   103 1           0           0               0           1
3   104 0           0           1               0           1

Upvotes: 0

Related Questions