Reputation: 271
Consider the following exemplar (for the problem) dataframe:
df1=pd.DataFrame({'ID':[91,2,33,41,56,78,910,331],'Score':[97,91,92,84,95.6,92,89,90]})
I have a function that is called with df1 as one of its arguments. Among other things, the function needs to return the first n (say, in this case 3) rows after sorting it on Score:
def sortandextract(df1,n1,...):
...
df1.sort_values(by=['Score'], ascending=[False],inplace=True)
df2=df1[:n1]
...
return(df2,len(df2))
This function is called many times with different df1s (these dataframes have hundreds of rows). Sometimes, the scores are repeated (as above), other times they are not.
I want a pythonic method for returning n1+k rows (where k is the number of times the last score in the Score column is repeated). In the above example, I will have a problem if were to do a function call:
sortandextract(df1,3)
I would want this function call to return 4 rows.
Upvotes: 0
Views: 74
Reputation: 323236
We can do rank
n=3
ret=df1[df1.Score.rank(method='min',ascending=False)<=n]
ID Score
0 91 97.0
2 33 92.0
4 56 95.6
5 78 92.0
Upvotes: 1