Reputation: 31
I am trying to handle my data set which contain some features that has some multiple values per instances as shown on the image
https://i.sstatic.net/D78el.png
I am trying to separate each value by '|' symbol to apply One-Hot encoding technique but I can't find any suitable solution to my problem
My idea is to keep every multiple values in one row or by another word convert each cell to list of integers
Upvotes: 0
Views: 438
Reputation: 3720
Maybe this is what you want:
df = pd.DataFrame(['465','444','465','864|857|850|843'],columns=['genre_ids'])
df
genre_ids
0 465
1 444
2 465
3 864|857|850|843
df['genre_ids'].str.get_dummies(sep='|')
444 465 843 850 857 864
0 0 1 0 0 0 0
1 1 0 0 0 0 0
2 0 1 0 0 0 0
3 0 0 1 1 1 1
Upvotes: 1