Reputation: 17527
Is there a way of aggregating or transforming in pandas that would give me the list of values that are present in each group.
For example, taking this data
+---------+-----------+
| user_id | module_id |
+---------+-----------+
| 1 | A |
| 1 | B |
| 1 | C |
| 2 | A |
| 2 | B |
| 2 | D |
| 3 | B |
| 3 | C |
| 3 | D |
| 3 | E |
+---------+-----------+
how would I complete this code
df.groupby('user_id')
to give the result C, the only module_id that is in each of the groups?
Upvotes: 2
Views: 41
Reputation: 863216
Use get_dummies
with max
for indicator DataFrame
and then filter only 1
columns - 1
values are processes like True
s in DataFrame.all
:
cols = (pd.get_dummies(df.set_index('user_id')['module_id'])
.max(level=0)
.loc[:, lambda x: x.all()].columns)
print (cols)
Index(['B'], dtype='object')
Similar solution:
df1 = pd.get_dummies(df.set_index('user_id')['module_id']).max(level=0)
print (df1)
A B C D E
user_id
1 1 1 1 0 0
2 1 1 0 1 0
3 0 1 1 1 1
cols = df1.columns[df1.all()]
More solutions:
cols = df.groupby(['module_id', 'user_id']).size().unstack().dropna().index
print (cols)
Index(['B'], dtype='object', name='module_id')
cols = df.pivot_table(index='module_id', columns='user_id', aggfunc='size').dropna().index
Upvotes: 1