Reputation: 18513
if I have the following panda DataFrame
:
pd.DataFrame(columns=['name', 'tags'], data=[
['Rob', ['a', 'c']],
['Erica', ['b', 'c']]
])
table:
Name tags
Rob ['a', 'c']
Erica ['b', 'c']
How would I convert this into:
Name tags_a tags_b tags_c
Rob 1 0 1
Erica 0 1 1
If each row could only have 1 tag I could do this with pd.get_dummies(df, columns=['tags'])
but this doesn't work when tags
is a List
.
Upvotes: 2
Views: 4530
Reputation: 19947
#use apply to transform tags to separate tags
df.apply(lambda x: [x['name']] + np.in1d(('a','b','c'),x.tags).astype(int).tolist() ,axis=1).apply(pd.Series)
#rename columns
df2.columns=['name', 'tags_a', 'tags_b', 'tags_c']
df2
Out[505]:
name tags_a tags_b tags_c
0 Rob 1 0 1
1 Erica 0 1 1
Upvotes: 0
Reputation: 21264
# reorganize data
df = pd.get_dummies(df.set_index('name').tags
.apply(pd.Series)
.stack()
).unstack()
# remove multilevel column and collapse counts per name
df.columns = df.columns.droplevel(1)
df.groupby(by=df.columns, axis=1).sum().add_prefix('tags_')
tags_a tags_b tags_c
name
Rob 1 0 1
Erica 0 1 1
Upvotes: 2
Reputation: 294218
str.get_dummies
df.tags.str.join('|').str.get_dummies().add_prefix('tags_')
tags_a tags_b tags_c
0 1 0 1
1 0 1 1
include join
df[['name']].join(df.tags.str.join('|').str.get_dummies().add_prefix('tags_'))
name tags_a tags_b tags_c
0 Rob 1 0 1
1 Erica 0 1 1
Upvotes: 8