Reputation: 1477
I have a dataframe like this:
seq score
0 TAAGAATTGTTCTCTGTGTATTT -23.19
1 AAGAATTGTTCTCTGTGTATTTC -3.67
2 AGAATTGTTCTCTGTGTATTTCA -16.49
3 GAATTGTTCTCTGTGTATTTCAG -11.83
4 AATTGTTCTCTGTGTATTTCAGG -10.86
5 ATTGTTCTCTGTGTATTTCAGGC -7.24
I want to select 3 rows in a loop and then get maximum value of the score.
The result I am looking for is like this:
seq score
1 AAGAATTGTTCTCTGTGTATTTC -3.67
5 ATTGTTCTCTGTGTATTTCAGGC -7.24
I tried applying groupby function and sort, but it does not seem to work as the seq column has unique values.
What other method can I use to get such result?
Upvotes: 2
Views: 410
Reputation: 1956
import pandas as pd
df = pd.DataFrame({'seq':['TAAGAATTGTTCTCTGTGTATTT','AAGAATTGTTCTCTGTGTATTTC','AGAATTGTTCTCTGTGTATTTCA','GAATTGTTCTCTGTGTATTTCAG','AATTGTTCTCTGTGTATTTCAGG','ATTGTTCTCTGTGTATTTCAGGC'],
'score': [-23.19,-3.67,-16.49,-11.83,-10.86,-7.24]})
df = df.loc[df.groupby(df.index // 3)['score'].idxmax()]
print(df)
Upvotes: 2
Reputation: 862671
Use DataFrameGroupBy.idxmax
for index of max value per groups created by integer division of index by 3
and then seelct rows by DataFrame.loc
:
df = df.loc[df.groupby(df.index // 3)['score'].idxmax()]
print (df)
seq score
1 AAGAATTGTTCTCTGTGTATTTC -3.67
5 ATTGTTCTCTGTGTATTTCAGGC -7.24
Details:
print (df.index // 3)
Int64Index([0, 0, 0, 1, 1, 1], dtype='int64')
print (df.groupby(df.index // 3)['score'].idxmax())
0 1
1 5
Name: score, dtype: int64
Upvotes: 2