Reputation: 817
I have a dataframe with different dates and a set of IDs per date. I would like to group by date, and then select the 3 smallest rows by a certain column (Amount) for that date. I am trying Pandas functions like below but am getting multiple errors.
This seems like a doable split-apply-combine method - could you please help understand where I am going wrong?
df.groupby(['OriginalAmount']).apply(nsmallest(2))
df.groupby(['OriginalAmount']).nsmallest(2).groupby(level='OriginalAmount')
Below is a sample dataframe and my desired output when n_smallest equals 2
Dataframe:
Date ID Amount
0 2018-01-01 23 5
1 2018-01-01 45 10
2 2018-01-01 56 15
3 2018-01-02 23 5
4 2018-01-02 45 10
5 2018-01-02 99 3
6 2018-01-02 56 15
7 2018-01-03 23 3
8 2018-01-03 45 6
9 2018-01-03 56 19
10 2018-01-03 99 3
Desired Output:
Date ID Amount
1/1/2018 23 5
1/1/2018 45 10
1/2/2018 99 3
1/2/2018 23 5
1/3/2018 23 3
1/3/2018 99 3
Upvotes: 3
Views: 749
Reputation: 59549
You are indicating that you want to groupby
'Date' but then you are grouping by 'OriginalAmount', which doesn't even appear to be a column in your DataFrame
.
While you can use nsmallest
, since you want to subset the original DataFrame
you can sort the Amount column and then just call .tail
, (or head) after you form groups based on 'Date'. Sort on the index at the end if you want the original order.
df.sort_values('Amount').groupby('Date').head(2).sort_index()
Output:
Date ID Amount
0 2018-01-01 23 5
1 2018-01-01 45 10
3 2018-01-02 23 5
5 2018-01-02 99 3
7 2018-01-03 23 3
10 2018-01-03 99 3
Upvotes: 8