Tie_24
Tie_24

Reputation: 647

Select row with conditions column value

            A         B    C
0  2002-01-13  15:00:00  120
1  2002-01-13  15:30:00  110
2  2002-01-13  16:00:00  130
3  2002-01-13  16:30:00  140
4  2002-01-14  15:00:00  180
5  2002-01-14  15:30:00  165
6  2002-01-14  16:00:00  150
7  2002-01-14  16:30:00  170

I want to select one row per each A group, with next conditions:

Output should be:

            A         B    C
0  2002-01-13  15:00:00  120
5  2002-01-14  15:30:00  165

Upvotes: 0

Views: 58

Answers (2)

jpp
jpp

Reputation: 164613

This is one vectorised solution. Sometimes a helper column is more efficient than an in-line lambda-based solution.

df['Floor'] = df['C'] - (df.groupby('A')['C'].transform('min') + 10)

res = df.loc[df[df['Floor'] >= 0].groupby('A')['Floor'].idxmin()]

Result:

            A         B    C  Floor
0  2002-01-13  15:00:00  120      0
5  2002-01-14  15:30:00  165      5

Upvotes: 1

jezrael
jezrael

Reputation: 862406

As @Anton vBR commented, first remove rows by condition per groups and then get rows by minimal C by idxmax and select by loc:

df = df[df.groupby('A')['C'].transform(lambda x: x >= x.min() + 10)]
#filtering with  transform `min` only
#df = df[df.groupby('A')['C'].transform('min') + 10 <= df['C']]
print (df)
            A         B    C
0  2002-01-13  15:00:00  120
2  2002-01-13  16:00:00  130
3  2002-01-13  16:30:00  140
4  2002-01-14  15:00:00  180
5  2002-01-14  15:30:00  165
7  2002-01-14  16:30:00  170

df = df.loc[df.groupby('A')['C'].idxmin()]

What is same as:

idx=df.sort_values(['A','C']).groupby('A')['C'].apply(lambda x: (x >= x.min() + 10).idxmax())
df = df.loc[idx]

Alternative solution with sort_values with drop_duplicates:

df = df.sort_values(['A','C'])
df = df[df.groupby('A')['C'].transform(lambda x: x >= x.min() + 10)].drop_duplicates(['A'])

print (df)
            A         B    C
0  2002-01-13  15:00:00  120
5  2002-01-14  15:30:00  165

Upvotes: 2

Related Questions