Reputation: 1489
I have this DF:
In [106]: dfTest = pd.DataFrame( {'name':['a','a','b','b'], 'value':['x','y','x','h']})
In [107]: dfTest
Out[107]:
name value
0 a x
1 a y
2 b x
3 b h
So my intention is to obtain one row per name
group and the value
to keep will depend. If for each group of name
I find h
in value
, I'd like to keep it. Otherwise, any value
would fit, such as:
In [109]: dfTest
Out[109]:
name value
0 a x
1 b h
Upvotes: 0
Views: 41
Reputation: 150785
Another approach with drop_duplicates
:
(dfTest.loc[dfTest['value'].eq('h').sort_values().index]
.drop_duplicates('name', keep='last')
)
Output:
name value
1 a y
3 b h
Upvotes: 1
Reputation: 153500
You can do it this way:
dfTest.reindex(dfTest.groupby('name')['value'].agg(lambda x: (x=='h').idxmax()))
Output:
name value
value
0 a x
3 b h
Upvotes: 2