Jose Manuel Albornoz
Jose Manuel Albornoz

Reputation: 128

Get names of dummy variables created by get_dummies

I have a dataframe with a very large number of columns of different types. I want to encode the categorical variables in my dataframe using get_dummies(). The question is: is there a way to get the column headers of the encoded categorical columns created by get_dummies()?

The hard way to do this would be to extract a list of all categorical variables in the dataframe, then append the different text labels associated to each categorical variable to the corresponding column headers. I wonder if there is an easier way to achieve the same end.

Upvotes: 1

Views: 1218

Answers (1)

B. Bogart
B. Bogart

Reputation: 1075

I think the way that should work with all the different uses of get_dummies would be:

#example data
import pandas as pd
df = pd.DataFrame({'P': ['p', 'q', 'p'], 'Q': ['q', 'p', 'r'],
                   'R': [2, 3, 4]})

dummies = pd.get_dummies(df)

#get column names that were not in the original dataframe
new_cols = dummies.columns[~dummies.columns.isin(df.columns)]

new_cols gives:

Index(['P_p', 'P_q', 'Q_p', 'Q_q', 'Q_r'], dtype='object')

I think the first column is the only column preserved when using get_dummies, so you could also just take the column names after the first column:

dummies.columns[1:]

which on this test data gives the same result:

Index(['P_p', 'P_q', 'Q_p', 'Q_q', 'Q_r'], dtype='object')

Upvotes: 1

Related Questions