Z_D
Z_D

Reputation: 817

Pandas: group by date and find smallest n for each date

I have a dataframe with different dates and a set of IDs per date. I would like to group by date, and then select the 3 smallest rows by a certain column (Amount) for that date. I am trying Pandas functions like below but am getting multiple errors.

This seems like a doable split-apply-combine method - could you please help understand where I am going wrong?

df.groupby(['OriginalAmount']).apply(nsmallest(2))
df.groupby(['OriginalAmount']).nsmallest(2).groupby(level='OriginalAmount')

Below is a sample dataframe and my desired output when n_smallest equals 2

Dataframe:

         Date  ID  Amount
0  2018-01-01  23       5
1  2018-01-01  45      10
2  2018-01-01  56      15
3  2018-01-02  23       5
4  2018-01-02  45      10
5  2018-01-02  99       3
6  2018-01-02  56      15
7  2018-01-03  23       3
8  2018-01-03  45       6
9  2018-01-03  56      19
10 2018-01-03  99       3

Desired Output:

Date    ID  Amount
1/1/2018    23  5
1/1/2018    45  10
1/2/2018    99  3
1/2/2018    23  5
1/3/2018    23  3
1/3/2018    99  3

Upvotes: 3

Views: 749

Answers (1)

ALollz
ALollz

Reputation: 59549

You are indicating that you want to groupby 'Date' but then you are grouping by 'OriginalAmount', which doesn't even appear to be a column in your DataFrame.

While you can use nsmallest, since you want to subset the original DataFrame you can sort the Amount column and then just call .tail, (or head) after you form groups based on 'Date'. Sort on the index at the end if you want the original order.

df.sort_values('Amount').groupby('Date').head(2).sort_index()

Output:

          Date  ID  Amount
0   2018-01-01  23       5
1   2018-01-01  45      10
3   2018-01-02  23       5
5   2018-01-02  99       3
7   2018-01-03  23       3
10  2018-01-03  99       3

Upvotes: 8

Related Questions