Reputation: 19395
consider the following example
df=pd.DataFrame({'col':['ABC','BDE','DE',np.nan,]})
df
Out[216]:
col
0 ABC
1 BDE
2 DE
3 NaN
I want to create a dummy variable for each letter in col.
In this example, we thus have 5 dummies: A,B,C,D,E. Indeed, in the first row 'ABC'
corresponds to category A and category B and category C.
Using get_dummies
fails
df.col.str.get_dummies(sep='')
Out[217]:
ABC BDE DE
0 1 0 0
1 0 1 1
2 0 0 1
3 0 0 0
Indeed, expected output for the first row should be
A B C D E
0 1 1 1 0 0
Do you have other ideas? Thanks!
Upvotes: 0
Views: 256
Reputation: 33803
You can use Series.str.join
to introduce a separator between each character, then use get_dummies
.
df.col.str.join('|').str.get_dummies()
The resulting output:
A B C D E
0 1 1 1 0 0
1 0 1 0 1 1
2 0 0 0 1 1
3 0 0 0 0 0
Upvotes: 2