Brian
Brian

Reputation: 2286

Groupby top n records based on another column

Below is an example of the data set I am working with. I am trying to do a group by on Location for only the top 3 locations based on KGs.

Location KG Dollars
BKK     7   2
BKK     5   3
BKK     4   2
BKK     3   3
BKK     2   2
HKG     8   3
HKG     6   2
HKG     4   3
HKG     3   2
HKG     2   3

The output would look like below. Grouped on the location, summing both KG and Dollars for the top 3 KG records for each location.

Location KG Dollars
BKK     16  7
HKG     18  8

I've tried different types of groupbys, just having a problem only specifying the top n KG records for the groupby.

Upvotes: 1

Views: 57

Answers (1)

Zero
Zero

Reputation: 77027

You could do

In [610]: df.groupby('Location').apply(lambda x: 
               x.nlargest(3, 'KG')[['KG', 'Dollars']].sum())
Out[610]:
          KG  Dollars
Location
BKK       16        7
HKG       18        8

Upvotes: 3

Related Questions