Reputation: 398
I have a dataframe:
df_test = pd.DataFrame({'col': ['paris', 'paris', 'nantes', 'berlin', 'berlin', 'berlin', 'tokyo'],
'id_res': [12, 12, 14, 28, 8, 4, 89]})
col id_res
0 paris 12
1 paris 12
2 nantes 14
3 berlin 28
4 berlin 8
5 berlin 4
6 tokyo 89
I want to create a "check" column whose values are as follows:
The output I want is therefore:
col id_res check
0 paris 12 False
1 paris 12 False
2 nantes 14 False
3 berlin 28 True
4 berlin 8 False
5 berlin 4 False
6 tokyo 89 False
I tried with groupby but no satisfactory result. Can anyone help me plz
Upvotes: 4
Views: 114
Reputation: 120469
Create 2 boolean masks then combine them and find the highest id_res
value per col
:
m1 = df['col'].duplicated(keep=False)
m2 = ~df['id_res'].duplicated(keep=False)
df['check'] = df.index.isin(df[m1 & m2].groupby('col')['id_res'].idxmax())
print(df)
# Output
col id_res check
0 paris 12 False
1 paris 12 False
2 nantes 14 False
3 berlin 28 True
4 berlin 8 False
5 berlin 4 False
6 tokyo 89 False
Details:
>>> pd.concat([df, m1.rename('m1'), m2.rename('m2')])
col id_res check m1 m2
0 paris 12 False True False
1 paris 12 False True False
2 nantes 14 False False True
3 berlin 28 True True True # <- group to check
4 berlin 8 False True True # <- because
5 berlin 4 False True True # <- m1 and m2 are True
6 tokyo 89 False False True
Upvotes: 7
Reputation: 261810
You basically have 3 conditions, so use masks and take the logical intersection (AND/&
):
g = df_test.groupby('col')['id_res']
# is col duplicated?
m1 = df_test['col'].duplicated(keep=False)
# [ True True False True True True False]
# is id_res max of its group?
m2 = df_test['id_res'].eq(g.transform('max'))
# [ True True True True False False True]
# is group diverse? (more than 1 id_res)
m3 = g.transform('nunique').gt(1)
# [False False False True True True False]
# check if all conditions True
df_test['check'] = m1&m2&m3
Output:
col id_res check
0 paris 12 False
1 paris 12 False
2 nantes 14 False
3 berlin 28 True
4 berlin 8 False
5 berlin 4 False
6 tokyo 89 False
Upvotes: 5