Reputation: 1144
So here is my data in pandas
Movie Tags
0 War film tank;plane
1 Spy film car;plane
i would like to create new column with the tag column with 0 and 1 and add a prefix like 'T_' to the name of the columns.
Like :
Movie Tags T_tank T_plane T_car
0 War film tank;plane 1 1 0
1 Spy film car;plane 0 1 1
I have some ideas on how to do it like line by line with a split(";") and a df.loc[:,'T_plane'] for example. But i think that may not be the optimal way to do it.
Regards
Upvotes: 2
Views: 416
Reputation: 164613
Using the sklearn
library:
from sklearn.preprocessing import MultiLabelBinarizer
mlb = MultiLabelBinarizer()
res = df.join(pd.DataFrame(mlb.fit_transform(df['Tags'].str.split(';')),
columns=mlb.classes_).add_prefix('T_'))
print(res)
Movie Tags T_car T_plane T_tank
0 War film tank;plane 0 1 1
1 Spy film car;plane 1 1 0
Upvotes: 2
Reputation: 59519
With .str.get_dummies
df.join(df.Tags.str.get_dummies(';').add_prefix('T_'))
Movie Tags T_car T_plane T_tank
0 War film tank;plane 0 1 1
1 Spy film car;plane 1 1 0
Upvotes: 1