Reputation: 753
There have been a lot of posts concerning splitting a single column into multiples, but I couldn't find an answer to a slight modification to the idea of splitting.
When you use str.split, it splits the string independent of order. You can modify it to be slightly more complex, such as ordering it by sorting alphabetically
e.x. dataframe (df)
row
0 a, e, c, b
1 b, d, a
2 a, b, c, d, e
3 d, f
foo = df['row'].str.split(',')
will split based on the comma and return:
0 1 2 3
0 a e c b
....
However that doesn't align the results by their unique value. Even if you use a sort on the split string, it will still only result in this:
0 1 2 3 4 5
0 a b c e
1 a b d
...
whereas I want it to look like this:
0 1 2 3 4 5
0 a b c e
1 a b d
2 a b c d e
...
I know I'm missing something. Do I need to add the columns first and then map the split values to the correct column? What if you don't know all of the unique values? Still learning pandas syntax so any pointers in the right direction would be appreciated.
Upvotes: 1
Views: 115
Reputation: 323316
Using get_dummies
s=df.row.str.get_dummies(sep=' ,')
s.mul(s.columns)
Out[239]:
a b c d e f
0 a b c e
1 a b d
2 a b c d e
3 d f
Upvotes: 1