Find which values are in every group in pandas

Question

Is there a way of aggregating or transforming in pandas that would give me the list of values that are present in each group.

For example, taking this data

+---------+-----------+
| user_id | module_id |
+---------+-----------+
|       1 |         A |
|       1 |         B |
|       1 |         C |
|       2 |         A |
|       2 |         B |
|       2 |         D |
|       3 |         B |
|       3 |         C |
|       3 |         D |
|       3 |         E |
+---------+-----------+

how would I complete this code

df.groupby('user_id')

to give the result C, the only module_id that is in each of the groups?

jezrael · Accepted Answer

Use get_dummies with max for indicator DataFrame and then filter only 1 columns - 1 values are processes like Trues in DataFrame.all:

cols = (pd.get_dummies(df.set_index('user_id')['module_id'])
          .max(level=0)
          .loc[:, lambda x: x.all()].columns)
print (cols)
Index(['B'], dtype='object')

Find which values are in every group in pandas

Answers (1)

Related Questions