Reputation: 2752
I am trying to get my table in the following form. For some reason, i could not get my pivot code working.
df = pd.DataFrame([('a','f1'), ('a','f2'),('a','f3') ,('b','f4'),('c','f2'), ('c','f4')], columns = ['user', 'val'])
df
---
user val
a f1
a f2
a f3
b f4
c f2
c f4
>> output
user f1 f2 f3 f4
a 1 1 1 0
b 0 0 0 1
c 1 0 1 0
Upvotes: 4
Views: 102
Reputation: 210932
Yet another solution.
In [82]: from sklearn.feature_extraction.text import CountVectorizer
In [83]: cv = CountVectorizer()
In [84]: d2 = df.groupby('user')['val'].agg(' '.join).reset_index(name='val')
In [85]: d2
Out[85]:
user val
0 a f1 f2 f3
1 b f4
2 c f2 f4
In [86]: r = pd.SparseDataFrame(cv.fit_transform(d2['val']),
...: d2.index,
...: cv.get_feature_names(),
...: default_fill_value=0)
...:
In [88]: d2[['user']].join(r)
Out[88]:
user f1 f2 f3 f4
0 a 1 1 1 0
1 b 0 0 0 1
2 c 0 1 0 1
Upvotes: 2
Reputation: 402922
Option 1
get_dummies
with groupby
+ sum
df.set_index('user').val.str.get_dummies().sum(level=0)
f1 f2 f3 f4
user
a 1 1 1 0
b 0 0 0 1
c 0 1 0 1
Option 2
groupby
+ value_counts
+ unstack
df.groupby('user').val.value_counts().unstack(fill_value=0)
val f1 f2 f3 f4
user
a 1 1 1 0
b 0 0 0 1
c 0 1 0 1
Option 3
pivot_table
with size
as the aggfunc
.
df.pivot_table(index='user', columns='val', aggfunc='size', fill_value=0)
val f1 f2 f3 f4
user
a 1 1 1 0
b 0 0 0 1
c 0 1 0 1
Upvotes: 5