Reputation: 2670
Say i have a data frame:
df = pd.DataFrame({'A':[5,4,7,8,1,2,3,4,5,7,8,9],'B':[1,2,2,2,2,5,9,8,8,10,11,10]})
print df
A B
0 5 1
1 4 2
2 7 2
3 8 2
4 1 2
5 2 5
6 3 9
7 4 8
8 5 8
9 7 10
10 8 11
11 9 10
And I want to find only the consecutively higher values in df.A, therefore df would become:
A B
0 5 1
2 7 2
3 8 2
11 9 10
What is the best(read fastest) way forward here; I have tried something quite complicated but alas, it's actually slower than looping thorough the whole frame.
Thanks.
Upvotes: 0
Views: 52
Reputation: 57033
This solution works only if B grows monotonously:
df.cummax().drop_duplicates('A')
# A B
#0 5 1
#2 7 2
#3 8 2
#11 9 10
This one is more general:
df['C'] = df['A'].cummax()
df.drop_duplicates('C')[['A','B']]
Upvotes: 1