Ahmed Alshoki
Ahmed Alshoki

Reputation: 31

Handling features with multiple values per instance in Python for Machine Learning model

I am trying to handle my data set which contain some features that has some multiple values per instances as shown on the image
https://i.sstatic.net/D78el.png
I am trying to separate each value by '|' symbol to apply One-Hot encoding technique but I can't find any suitable solution to my problem
My idea is to keep every multiple values in one row or by another word convert each cell to list of integers

Upvotes: 0

Views: 438

Answers (1)

sitting_duck
sitting_duck

Reputation: 3720

Maybe this is what you want:

df = pd.DataFrame(['465','444','465','864|857|850|843'],columns=['genre_ids'])
df

         genre_ids
0              465
1              444
2              465
3  864|857|850|843

df['genre_ids'].str.get_dummies(sep='|')

   444  465  843  850  857  864
0    0    1    0    0    0    0
1    1    0    0    0    0    0
2    0    1    0    0    0    0
3    0    0    1    1    1    1

Upvotes: 1

Related Questions