return the n largest/smallest values in pandas dataframe where many rows contain same values

Question

I am wondering how to return the rows containing n smallest values in a dataframe df that looks like this.

id           xx             count
1            A              1
2            B              1
3            C              3
4            D              2
5            E              3
6            F              10
7            G              11
8            H              17

Say I want to find the rows containing the 3 smallest counts (in this case, the 3 smallest counts are 1,2 and 3). So, I want the answer to be like this:

id           xx             count
    1            A              1
    2            B              1
    4            D              2
    3            C              3
    5            E              3

If I simply sort the dataframe based on count and use df.nsmallest(3, 'count'), it would only return the first three rows of the desired dataframe. But I want all rows that contain the 3 smallest counts. Is there any simpler way to do this in pandas? Thanks in advance!

jezrael · Accepted Answer

You can first drop_duplicates with nsmallest for finding values and then boolean indexing with isin:

s = df['count'].drop_duplicates().nsmallest(3)
print (s)
0    1
3    2
2    3
Name: count, dtype: int64

print (df[df['count'].isin(s)])
   id xx  count
0   1  A      1
1   2  B      1
2   3  C      3
3   4  D      2
4   5  E      3

Another solution with unique, sorting by numpy.sort (because output of unique in numpy array) and selecting first 3 values:

arr = np.sort(df['count'].unique())[:3]
print (arr)
[1 2 3]

print (df[df['count'].isin(arr)])
   id xx  count
0   1  A      1
1   2  B      1
2   3  C      3
3   4  D      2
4   5  E      3

return the n largest/smallest values in pandas dataframe where many rows contain same values

Answers (2)

Related Questions