Reputation: 223
I have a dataframe as below:
df
ID Issues
A ['P1', 'S2', 'L12']
A ['P1']
A ['L12']
B ['X5', 'K7']
B ['K7']
C ['F4']
C ['G9']
I need to group by ID and get all common values within each group for Issues column
I tried df['result'] = df.groupby('ID')['Issues'].transform(lambda x: x.mode().iat[0])
but the result was not as expected
Desired result
ID Issues Result
A ['P1', 'S2', 'L12'] ['P1', L12']
A ['P1'] ['P1', L12']
A ['L12'] ['P1', L12']
B ['X5', 'K7'] ['K7']
B ['K7'] ['K7']
C ['F4'] []
C ['G9'] []
Any ideas or help ?
Upvotes: 0
Views: 502
Reputation: 150735
You want to identify all issues that appear more than once. If so, you can use explode
then check for duplicates:
dup_issues = (df.explode('Issues')
.loc[lambda x: x.duplicated(['ID','Issues'], keep=False)]
.groupby('ID').Issues
.agg(lambda x: list(set(x)))
)
df['Result'] = df['ID'].map(dup_issues)
Output:
ID Issues Result
0 A [P1, S2, L12] [P1, L12]
1 A [P1] [P1, L12]
2 A [L12] [P1, L12]
3 B [X5, K7] [K7, K7]
4 B [K7] [K7, K7]
5 C [F4] NaN
6 C [G9] NaN
Upvotes: 5