rshar
rshar

Reputation: 1477

Calculate maximum of next 3 rows of a particular column in each row in Python

I have a dataframe like this:

    seq                         score
0   TAAGAATTGTTCTCTGTGTATTT     -23.19
1   AAGAATTGTTCTCTGTGTATTTC     -3.67
2   AGAATTGTTCTCTGTGTATTTCA     -16.49
3   GAATTGTTCTCTGTGTATTTCAG     -11.83
4   AATTGTTCTCTGTGTATTTCAGG     -10.86
5   ATTGTTCTCTGTGTATTTCAGGC     -7.24

I want to select 3 rows in a loop and then get maximum value of the score.

The result I am looking for is like this:


    seq                          score
1   AAGAATTGTTCTCTGTGTATTTC     -3.67
5   ATTGTTCTCTGTGTATTTCAGGC     -7.24

I tried applying groupby function and sort, but it does not seem to work as the seq column has unique values.

What other method can I use to get such result?

Upvotes: 2

Views: 410

Answers (2)

Chrisvdberge
Chrisvdberge

Reputation: 1956

import pandas as pd

df = pd.DataFrame({'seq':['TAAGAATTGTTCTCTGTGTATTT','AAGAATTGTTCTCTGTGTATTTC','AGAATTGTTCTCTGTGTATTTCA','GAATTGTTCTCTGTGTATTTCAG','AATTGTTCTCTGTGTATTTCAGG','ATTGTTCTCTGTGTATTTCAGGC'],
                   'score': [-23.19,-3.67,-16.49,-11.83,-10.86,-7.24]})
df = df.loc[df.groupby(df.index // 3)['score'].idxmax()]
print(df)

Upvotes: 2

jezrael
jezrael

Reputation: 862671

Use DataFrameGroupBy.idxmax for index of max value per groups created by integer division of index by 3 and then seelct rows by DataFrame.loc:

df = df.loc[df.groupby(df.index // 3)['score'].idxmax()]
print (df)
                       seq  score
1  AAGAATTGTTCTCTGTGTATTTC  -3.67
5  ATTGTTCTCTGTGTATTTCAGGC  -7.24

Details:

print (df.index // 3)
Int64Index([0, 0, 0, 1, 1, 1], dtype='int64')

print (df.groupby(df.index // 3)['score'].idxmax())
0    1
1    5
Name: score, dtype: int64

Upvotes: 2

Related Questions