geo_coder
geo_coder

Reputation: 753

pandas advanced splitting by comma

There have been a lot of posts concerning splitting a single column into multiples, but I couldn't find an answer to a slight modification to the idea of splitting.

When you use str.split, it splits the string independent of order. You can modify it to be slightly more complex, such as ordering it by sorting alphabetically

e.x. dataframe (df)

     row
0    a, e, c, b
1    b, d, a
2    a, b, c, d, e
3    d, f

foo = df['row'].str.split(',')

will split based on the comma and return:

     0     1    2    3
0    a     e    c    b
....

However that doesn't align the results by their unique value. Even if you use a sort on the split string, it will still only result in this:

     0    1    2    3    4    5
0    a    b    c    e
1    a    b    d
...

whereas I want it to look like this:

     0    1    2    3    4    5
0    a    b    c         e
1    a    b         d
2    a    b    c    d    e   
...

I know I'm missing something. Do I need to add the columns first and then map the split values to the correct column? What if you don't know all of the unique values? Still learning pandas syntax so any pointers in the right direction would be appreciated.

Upvotes: 1

Views: 115

Answers (1)

BENY
BENY

Reputation: 323316

Using get_dummies

s=df.row.str.get_dummies(sep=' ,')
s.mul(s.columns)
Out[239]: 
   a  b  c  d  e  f
0  a  b  c     e   
1  a  b     d      
2  a  b  c  d  e   
3           d     f

Upvotes: 1

Related Questions