code_learner
code_learner

Reputation: 223

How to groupby one column and get all common values in another column pandas

I have a dataframe as below:

df

ID  Issues
A    ['P1', 'S2', 'L12']
A    ['P1']
A    ['L12']
B    ['X5', 'K7']
B    ['K7']
C    ['F4']
C    ['G9']

I need to group by ID and get all common values within each group for Issues column

I tried df['result'] = df.groupby('ID')['Issues'].transform(lambda x: x.mode().iat[0]) but the result was not as expected

Desired result

ID  Issues                Result
A    ['P1', 'S2', 'L12']  ['P1', L12']
A    ['P1']               ['P1', L12']
A    ['L12']              ['P1', L12']
B    ['X5', 'K7']         ['K7']
B    ['K7']               ['K7']
C    ['F4']               []
C    ['G9']               []

Any ideas or help ?

Upvotes: 0

Views: 502

Answers (1)

Quang Hoang
Quang Hoang

Reputation: 150735

You want to identify all issues that appear more than once. If so, you can use explode then check for duplicates:

dup_issues = (df.explode('Issues')
   .loc[lambda x: x.duplicated(['ID','Issues'], keep=False)]
   .groupby('ID').Issues
   .agg(lambda x: list(set(x)))
)

df['Result'] = df['ID'].map(dup_issues)

Output:

  ID         Issues     Result
0  A  [P1, S2, L12]  [P1, L12]
1  A           [P1]  [P1, L12]
2  A          [L12]  [P1, L12]
3  B       [X5, K7]   [K7, K7]
4  B           [K7]   [K7, K7]
5  C           [F4]        NaN
6  C           [G9]        NaN

Upvotes: 5

Related Questions