how to use get_dummies in Pandas when different categories are concatenated into a string without a separator?

Question

consider the following example

df=pd.DataFrame({'col':['ABC','BDE','DE',np.nan,]})

df
Out[216]: 
   col
0  ABC
1  BDE
2   DE
3  NaN

I want to create a dummy variable for each letter in col.

In this example, we thus have 5 dummies: A,B,C,D,E. Indeed, in the first row 'ABC' corresponds to category A and category B and category C.

Using get_dummies fails

df.col.str.get_dummies(sep='')
Out[217]: 
   ABC  BDE  DE
0    1    0   0
1    0    1   1
2    0    0   1
3    0    0   0

Indeed, expected output for the first row should be

    A  B  C  D  E
0   1  1  1  0  0

Do you have other ideas? Thanks!

root · Accepted Answer

You can use Series.str.join to introduce a separator between each character, then use get_dummies.

df.col.str.join('|').str.get_dummies()

The resulting output:

   A  B  C  D  E
0  1  1  1  0  0
1  0  1  0  1  1
2  0  0  0  1  1
3  0  0  0  0  0

Answers (1)