Reputation: 2122
I am wondering how to return the rows containing n smallest values in a dataframe df that looks like this.
id xx count
1 A 1
2 B 1
3 C 3
4 D 2
5 E 3
6 F 10
7 G 11
8 H 17
Say I want to find the rows containing the 3 smallest counts (in this case, the 3 smallest counts are 1,2 and 3). So, I want the answer to be like this:
id xx count
1 A 1
2 B 1
4 D 2
3 C 3
5 E 3
If I simply sort the dataframe based on count and use df.nsmallest(3, 'count')
, it would only return the first three rows of the desired dataframe. But I want all rows that contain the 3 smallest counts. Is there any simpler way to do this in pandas? Thanks in advance!
Upvotes: 2
Views: 3930
Reputation: 210882
Another solution - using rank() method:
In [43]: df[df['count'].rank(method='dense') <= 3]
Out[43]:
id xx count
0 1 A 1
1 2 B 1
2 3 C 3
3 4 D 2
4 5 E 3
Upvotes: 3
Reputation: 863166
You can first drop_duplicates
with nsmallest
for finding values and then boolean indexing
with isin
:
s = df['count'].drop_duplicates().nsmallest(3)
print (s)
0 1
3 2
2 3
Name: count, dtype: int64
print (df[df['count'].isin(s)])
id xx count
0 1 A 1
1 2 B 1
2 3 C 3
3 4 D 2
4 5 E 3
Another solution with unique
, sorting by numpy.sort
(because output of unique
in numpy array
) and selecting first 3 values:
arr = np.sort(df['count'].unique())[:3]
print (arr)
[1 2 3]
print (df[df['count'].isin(arr)])
id xx count
0 1 A 1
1 2 B 1
2 3 C 3
3 4 D 2
4 5 E 3
Upvotes: 5