Reputation: 1247
I am using OneHotEncoder
to create a matrix based on below dataframe.
CA Mode Avenue
100 Cash CC
101 Cheque CC
103 Cash DF
104 Digital DF
So When I use OneHotEncoder
the resultant dataframe should look like
CA X0_Cash x0_Cheque X0_Digital X1_CC X1_DF
100 1 0 0 1 0
101 0 1 0 1 0
102 1 0 0 0 1
104 0 0 1 0 1
As we can see the column names have been changed. I want the column names as Avenue_Cash
, Avenue_Cheque
i.e. first the original column name followed by an 'underscore' and then row value.
How to achieve the same? Any clue?
Upvotes: 0
Views: 1271
Reputation: 2019
You can use the get_feature_names
method to set the column names as you'd like.
import numpy as np
import pandas as pd
from sklearn.preprocessing import OneHotEncoder
# set up the data
X = [[100, "Cash", "CC"], [101, "Cheque", "CC"], [103, "Cash", "DF"], [104, "Digital", "DF"]]
# set up the OneHotEncoder with sparse=False
enc = OneHotEncoder(handle_unknown='ignore', sparse=False)
# fit the OneHotEncoder
enc.fit(X)
# define your column names
colnames = enc.get_feature_names(["CA", "Mode", "Avenue"])
# make your transformed dataframe
X_new = pd.DataFrame(enc.fit_transform(X), columns = colnames)
X_new
CA_100 CA_101 CA_103 ... Mode_Digital Avenue_CC Avenue_DF
0 1.0 0.0 0.0 ... 0.0 1.0 0.0
1 0.0 1.0 0.0 ... 0.0 1.0 0.0
2 0.0 0.0 1.0 ... 0.0 0.0 1.0
3 0.0 0.0 0.0 ... 1.0 0.0 1.0
Upvotes: 3
Reputation: 19957
You can use pandas.get_dummies which gives the result you want by default:
Setup:
df = pd.DataFrame({'CA': {0: 100, 1: 101, 2: 103, 3: 104},
'Mode': {0: 'Cash', 1: 'Cheque', 2: 'Cash', 3: 'Digital'},
'Avenue': {0: 'CC', 1: 'CC', 2: 'DF', 3: 'DF'}})
Solution:
pd.get_dummies(df, columns=['Mode', 'Avenue'])
CA Mode_Cash Mode_Cheque Mode_Digital Avenue_CC Avenue_DF
0 100 1 0 0 1 0
1 101 0 1 0 1 0
2 103 1 0 0 0 1
3 104 0 0 1 0 1
Upvotes: 0