Pandas keep highest value in every n consecutive rows

Question

I have a pandas dataframe called df_initial with two columns 'a' and 'b' and N rows.

I would like to half the rows number, deleting the row where the value of 'b' is lower.

Thus between row 0 and row 1 I will keep row 1, between row 2 and row 3 I will keep row 3 etc..

This is the result that I would like to obtain:

print(df_initial)
         a     b
0     0.04  0.01
1     0.05  0.22
2     0.06  0.34
3     0.07  0.49
4     0.08  0.71
5     0.09  0.09
6     0.10  0.98
7     0.11  0.42
8     0.12  1.32
9     0.13  0.39
10    0.14  0.97
11    0.15  0.05
12    0.16  0.36
13    0.17  1.72
....

print(df_reduced)
         a     b
0     0.05  0.22
1     0.07  0.49
2     0.08  0.71
3     0.10  0.98
4     0.12  1.32
5     0.14  0.97
6     0.17  1.72
....

Is there some Pandas function to do this ?

I saw that there is a resample function, DataFrame.resample() , but it is valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, so not in this case.

Thanks who will help me

yatu · Accepted Answer

You can groupby every two rows (a simple way of doing so is taking the floor division of the index) and take the idxmax of column b to index the dataframe:

df.loc[df.groupby(df.index//2).b.idxmax(), :]

    a     b
0  0.05  0.22
1  0.07  0.49
2  0.09  0.71
3  0.11  0.98
4  0.13  1.32
5  0.15  0.97
6  0.17  1.72

Or using DataFrame.rolling:

df.loc[df.b.rolling(2).max()[1::2].index, :]

Pandas keep highest value in every n consecutive rows

Answers (2)

Related Questions