Reputation: 647
A B C
0 2002-01-13 15:00:00 120
1 2002-01-13 15:30:00 110
2 2002-01-13 16:00:00 130
3 2002-01-13 16:30:00 140
4 2002-01-14 15:00:00 180
5 2002-01-14 15:30:00 165
6 2002-01-14 16:00:00 150
7 2002-01-14 16:30:00 170
I want to select one row per each A group, with next conditions:
Output should be:
A B C
0 2002-01-13 15:00:00 120
5 2002-01-14 15:30:00 165
Upvotes: 0
Views: 58
Reputation: 164613
This is one vectorised solution. Sometimes a helper column is more efficient than an in-line lambda
-based solution.
df['Floor'] = df['C'] - (df.groupby('A')['C'].transform('min') + 10)
res = df.loc[df[df['Floor'] >= 0].groupby('A')['Floor'].idxmin()]
Result:
A B C Floor
0 2002-01-13 15:00:00 120 0
5 2002-01-14 15:30:00 165 5
Upvotes: 1
Reputation: 862406
As @Anton vBR commented, first remove rows by condition per groups and then get rows by minimal C
by idxmax
and select by loc
:
df = df[df.groupby('A')['C'].transform(lambda x: x >= x.min() + 10)]
#filtering with transform `min` only
#df = df[df.groupby('A')['C'].transform('min') + 10 <= df['C']]
print (df)
A B C
0 2002-01-13 15:00:00 120
2 2002-01-13 16:00:00 130
3 2002-01-13 16:30:00 140
4 2002-01-14 15:00:00 180
5 2002-01-14 15:30:00 165
7 2002-01-14 16:30:00 170
df = df.loc[df.groupby('A')['C'].idxmin()]
What is same as:
idx=df.sort_values(['A','C']).groupby('A')['C'].apply(lambda x: (x >= x.min() + 10).idxmax())
df = df.loc[idx]
Alternative solution with sort_values
with drop_duplicates
:
df = df.sort_values(['A','C'])
df = df[df.groupby('A')['C'].transform(lambda x: x >= x.min() + 10)].drop_duplicates(['A'])
print (df)
A B C
0 2002-01-13 15:00:00 120
5 2002-01-14 15:30:00 165
Upvotes: 2