Reputation: 1439
Let's say I have the following df
x
1 ['abc','bac','cab']
2 ['bac']
3 ['abc','cab']
And I would like to take each element of each list and put it into a new row, like so
abc bac cab
1 1 1 1
2 0 1 0
3 1 0 1
I have referred to multiple links but can't seem to get this correctly.
Thanks!
Upvotes: 2
Views: 1256
Reputation: 323236
I will do
from sklearn.preprocessing import MultiLabelBinarizer
mlb = MultiLabelBinarizer()
s = pd.DataFrame(mlb.fit_transform(df['x']), columns=mlb.classes_, index=df.index)
Upvotes: 1
Reputation: 35636
One approach with str.join
+ str.get_dummies
:
out = df['x'].str.join(',').str.get_dummies(',')
out
:
abc bac cab
0 1 1 1
1 0 1 0
2 1 0 1
Or with explode
+ pd.get_dummies
then groupby max
:
out = pd.get_dummies(df['x'].explode()).groupby(level=0).max()
out
:
abc bac cab
0 1 1 1
1 0 1 0
2 1 0 1
Can also do pd.crosstab
after explode
if want counts instead of dummies:
s = df['x'].explode()
out = pd.crosstab(s.index, s)
out
:
x abc bac cab
row_0
0 1 1 1
1 0 1 0
2 1 0 1
*Note output is the same here, but will be count if there are duplicates.
DataFrame:
import pandas as pd
df = pd.DataFrame({
'x': [['abc', 'bac', 'cab'], ['bac'], ['abc', 'cab']]
})
Upvotes: 3