Reputation: 585
Data:
qid qualid val
0 1845631864 227 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1 1899053658 44 1,3,3,2,2,2,3,3,4,4,4,5,5,5,5,5,5,5
2 1192887045 197 704
3 1833579269 194 139472
4 1497352469 30 120026,170154,152723,90407,63119,80077,178871,...
Problem:
Numbers separated by commas in column val need to represented in different columns for each row.
I don't know if Pandas allows for it, but ideally, one would want to create exact n number of columns for each row, where n is the number of elements in column val.
If that is not possible, the greatest number of elements in val should be the number of columns and rows where elements are lesser than that should consist of NaNs.
Example Solution 1 for Above Problem:
qid qualid val1 val2 val3 valn-3 valn-2 valn-1 valn
0 1845631864 227 0 0 0 ...... 0 0 0 0
1 1899053658 44 1 3 3 ...... 5
2 1192887045 197 704
3 1833579269 194 139472
4 1497352469 30 120026 170154 152723.....63119 80077 178871 12313
Alternate Solution 2 for Above Problem:
qid qualid val1 val2 val3 valn-3 valn-2 valn-1 valn
0 1845631864 227 0 0 0 ...... 0 0 0 0
1 1899053658 44 1 3 3 ...... 5 NaN NaN NaN
2 1192887045 197 704 NaN NaN ...... NaN NaN NaN NaN
3 1833579269 194 139472 NaN NaN ...... NaN NaN NaN
4 1497352469 30 120026 170154 152723.....63119 80077 178871 12313
Upvotes: 1
Views: 1252
Reputation: 323316
You can check str.split
pd.concat([df,df.val.str.split(',',expand=True).add_prefix('Val_')],axis=1)
Out[29]:
qid qualid ... Val_16 Val_17
0 1845631864 227 ... 0 0
1 1899053658 44 ... 5 5
2 1192887045 197 ... None None
3 1833579269 194 ... None None
4 1497352469 30 ... None None
Upvotes: 5