AdmiralWen
AdmiralWen

Reputation: 721

Pandas: Turn multiple variables into a single set of dummy variables

I have a column with categories (A, B, C, D) I want to turn into dummy variables. Problem is, this column can contain multiple categories per row, like this:

DF = pd.DataFrame({'Col':['A', 'A, B', 'A, C', 'B, C, D', 'D']})

    Col
0   A
1   A, B
2   A, C
3   B, C, D
4   D

My thought at this point is to first split the variable into multiple fields using ',' as the delimiter, then dummy-code the results. Something like this:

DF2 = DF['Col'].str.split(', ', expand = True)

    0   1     2
0   A   None  None
1   A   B     None
2   A   C     None
3   B   C     D
4   D   None  None

pd.get_dummies(DF2)

    0_A 0_B 0_D 1_B 1_C 2_D
0   1   0   0   0   0   0
1   1   0   0   1   0   0
2   1   0   0   0   1   0
3   0   1   0   0   1   1
4   0   0   1   0   0   0

Finally, run some sort of loop through across the columns to create a single set of dummy variables for A, B, C, and D. This can work, but gets quite tedious with many more variables/categories. Is there an easier way to achieve this?

Upvotes: 1

Views: 636

Answers (2)

BENY
BENY

Reputation: 323226

By using pd.crosstab

import pandas as pd
df = pd.DataFrame({'Col':['A', 'A,B', 'A,C', 'B,C,D', 'D']})
df.Col=df.Col.str.split(',')
df1=df.Col.apply(pd.Series).stack()
pd.crosstab(df1.index.get_level_values(0),df1)

Out[893]: 
col_0  A  B  C  D
row_0            
0      1  0  0  0
1      1  1  0  0
2      1  0  1  0
3      0  1  1  1
4      0  0  0  1

Upvotes: 1

piRSquared
piRSquared

Reputation: 294258

Simplest way is

DF.Col.str.get_dummies(', ')

   A  B  C  D
0  1  0  0  0
1  1  1  0  0
2  1  0  1  0
3  0  1  1  1
4  0  0  0  1

Slightly more complicated

from sklearn.preprocessing import MultiLabelBinarizer
from numpy.core.defchararray import split

mlb = MultiLabelBinarizer()
s = DF.Col.values.astype(str)
d = mlb.fit_transform(split(s, ', '))

pd.DataFrame(d, columns=mlb.classes_)

   A  B  C  D
0  1  0  0  0
1  1  1  0  0
2  1  0  1  0
3  0  1  1  1
4  0  0  0  1

Upvotes: 6

Related Questions