Select rows from a dataframe, accounting for repeat values for a column

Question

Consider the following exemplar (for the problem) dataframe:

df1=pd.DataFrame({'ID':[91,2,33,41,56,78,910,331],'Score':[97,91,92,84,95.6,92,89,90]})

I have a function that is called with df1 as one of its arguments. Among other things, the function needs to return the first n (say, in this case 3) rows after sorting it on Score:

def sortandextract(df1,n1,...):
    ...
    df1.sort_values(by=['Score'], ascending=[False],inplace=True)
    df2=df1[:n1]
    ...
    return(df2,len(df2))

This function is called many times with different df1s (these dataframes have hundreds of rows). Sometimes, the scores are repeated (as above), other times they are not.

I want a pythonic method for returning n1+k rows (where k is the number of times the last score in the Score column is repeated). In the above example, I will have a problem if were to do a function call:

sortandextract(df1,3)

I would want this function call to return 4 rows.

BENY · Accepted Answer

We can do rank

n=3
ret=df1[df1.Score.rank(method='min',ascending=False)<=n]
   ID  Score
0  91   97.0
2  33   92.0
4  56   95.6
5  78   92.0

Select rows from a dataframe, accounting for repeat values for a column

Answers (1)

Related Questions