Calculate maximum of next 3 rows of a particular column in each row in Python

Question

I have a dataframe like this:

    seq                         score
0   TAAGAATTGTTCTCTGTGTATTT     -23.19
1   AAGAATTGTTCTCTGTGTATTTC     -3.67
2   AGAATTGTTCTCTGTGTATTTCA     -16.49
3   GAATTGTTCTCTGTGTATTTCAG     -11.83
4   AATTGTTCTCTGTGTATTTCAGG     -10.86
5   ATTGTTCTCTGTGTATTTCAGGC     -7.24

I want to select 3 rows in a loop and then get maximum value of the score.

The result I am looking for is like this:


    seq                          score
1   AAGAATTGTTCTCTGTGTATTTC     -3.67
5   ATTGTTCTCTGTGTATTTCAGGC     -7.24

I tried applying groupby function and sort, but it does not seem to work as the seq column has unique values.

What other method can I use to get such result?

jezrael · Accepted Answer

Use DataFrameGroupBy.idxmax for index of max value per groups created by integer division of index by 3 and then seelct rows by DataFrame.loc:

df = df.loc[df.groupby(df.index // 3)['score'].idxmax()]
print (df)
                       seq  score
1  AAGAATTGTTCTCTGTGTATTTC  -3.67
5  ATTGTTCTCTGTGTATTTCAGGC  -7.24

Details:

print (df.index // 3)
Int64Index([0, 0, 0, 1, 1, 1], dtype='int64')

print (df.groupby(df.index // 3)['score'].idxmax())
0    1
1    5
Name: score, dtype: int64

Calculate maximum of next 3 rows of a particular column in each row in Python

Answers (2)

Related Questions