Split pandas column and create new columns that count the split values

Question

I have a goofy data where one column contains multiple values slammed together with a comma:

In [62]: df = pd.DataFrame({'U': ['foo', 'bar', 'baz'], 'V': ['a,b,a,c,d', 'a,b,c', 'd,e']})                                     

In [63]: df                                                                                                                      
Out[63]: 
     U          V
0  foo  a,b,a,c,d
1  bar      a,b,c
2  baz        d,e

Now I want to split column V, drop it, and add columns a through e. Columns a through e should contains the count of the occurrences of that letter in that row:

In [62]: df = pd.DataFrame({'U': ['foo', 'bar', 'baz'], 'V': ['a,b,a,c,d', 'a,b,c', 'd,e']})                                     

In [63]: df                                                                                                                      
Out[63]: 
     U  a  b  c  d  e
0  foo  2  1  1  1  0
1  bar  1  1  1  0  0
2  baz  0  0  0  1  1

Maybe some combination of df['V'].str.split(',') and pandas.get_dummies but I can't quite work it out.

Edit: apparently I have to justify why my question is not a duplicate. I think why is intuitively obvious to the most casual observer.

BENY · Accepted Answer

This is str.get_dummies

pd.concat([df,df.pop('V').str.split(',',expand=True).stack().str.get_dummies().sum(level=0)],1)
Out[602]: 
     U  a  b  c  d  e
0  foo  2  1  1  1  0
1  bar  1  1  1  0  0
2  baz  0  0  0  1  1

Split pandas column and create new columns that count the split values

Answers (2)

Related Questions