dumbledad
dumbledad

Reputation: 17527

Find which values are in every group in pandas

Is there a way of aggregating or transforming in pandas that would give me the list of values that are present in each group.

For example, taking this data

+---------+-----------+
| user_id | module_id |
+---------+-----------+
|       1 |         A |
|       1 |         B |
|       1 |         C |
|       2 |         A |
|       2 |         B |
|       2 |         D |
|       3 |         B |
|       3 |         C |
|       3 |         D |
|       3 |         E |
+---------+-----------+

how would I complete this code

df.groupby('user_id')

to give the result C, the only module_id that is in each of the groups?

Upvotes: 2

Views: 41

Answers (1)

jezrael
jezrael

Reputation: 863216

Use get_dummies with max for indicator DataFrame and then filter only 1 columns - 1 values are processes like Trues in DataFrame.all:

cols = (pd.get_dummies(df.set_index('user_id')['module_id'])
          .max(level=0)
          .loc[:, lambda x: x.all()].columns)
print (cols)
Index(['B'], dtype='object')

Similar solution:

df1 = pd.get_dummies(df.set_index('user_id')['module_id']).max(level=0)
print (df1)
         A  B  C  D  E
user_id               
1        1  1  1  0  0
2        1  1  0  1  0
3        0  1  1  1  1
cols = df1.columns[df1.all()]

More solutions:

cols = df.groupby(['module_id', 'user_id']).size().unstack().dropna().index
print (cols)
Index(['B'], dtype='object', name='module_id')

cols = df.pivot_table(index='module_id', columns='user_id', aggfunc='size').dropna().index

Upvotes: 1

Related Questions