Tim
Tim

Reputation: 39

Can you extend the list of dummies in pandas.get_dummies?

Suppose I have the following dataset (2 rows, 2 columns, headers are Char0 and Char1):

dataset = [['A', 'B'], ['B', 'C']]
columns = ['Char0', 'Char1']
df = pd.DataFrame(dataset, columns=columns)

I would like to one-hot encode the columns Char0 and Char1, so:

df = pd.concat([df, pd.get_dummies(df["Char0"], prefix='Char0')], axis=1)
df = pd.concat([df, pd.get_dummies(df["Char1"], prefix='Char1')], axis=1)
df.drop(['Char0', "Char1"], axis=1, inplace=True)

which results in a dataframe with column headers Char0_A, Char0_B, Char1_B, Char1_C.

Now, I would like to, for each column, have an indication for both A, B, C, and D (even though, there is currently no 'D' in the dataset). In this case, this would mean 8 columns: Char0_A, Char0_B, Char0_C, Char0_D, Char1_A, Char1_B, Char1_C, Char1_D.

Can somebody help me out?

Upvotes: 0

Views: 223

Answers (1)

jezrael
jezrael

Reputation: 863166

Use get_dummies with all columns and then add DataFrame.reindex with all possible combinations of columns created by itertools.product:

dataset = [['A', 'B'], ['B', 'C']]
columns = ['Char0', 'Char1']
df = pd.DataFrame(dataset, columns=columns)

vals = ['A','B','C','D']

from  itertools import product
cols = ['_'.join(x) for x in product(df.columns, vals)]
print (cols)
['Char0_A', 'Char0_B', 'Char0_C', 'Char0_D', 'Char1_A', 'Char1_B', 'Char1_C', 'Char1_D']

df1 = pd.get_dummies(df).reindex(cols, axis=1, fill_value=0)

print (df1)
   Char0_A  Char0_B  Char0_C  Char0_D  Char1_A  Char1_B  Char1_C  Char1_D
0        1        0        0        0        0        1        0        0
1        0        1        0        0        0        0        1        0

Upvotes: 2

Related Questions