Getting Top N results from pandas groupby

Question

I have a two columns in dataset:

1) Supplier_code

2) Item_code

I have grouped them using:

data.groupby(['supplier_code', 'item_code']).size()

I get result like this:

supplier_code  item_code
591495         127018419     9
               547173046     1
3024466        498370473     1
               737511044     1
               941755892     1
6155238        875189969     1
13672569       53152664      1
               430351453     1
               573603000     1
               634275342     1
18510135       362522958     6
               405196476     6
               441901484    12
29222428       979575973     1
31381089       28119319      2
               468441742     3
               648079349    18
               941387936     1

I have my top 15 suppliers using:

supCounter = collections.Counter(datalist[3])
supDic = dict(sorted(supCounter.iteritems(), key=operator.itemgetter(1), reverse=True)[:15]) 
print supDic.keys()

This is my list of top 15 suppliers:

[723223131, 687164888, 594473706, 332379250, 203288669, 604236177, 
533512754, 503134099, 982883317, 147405879, 151212120, 737780569, 561901243, 
786265866, 79886783]

Now I want to join the two, i.e. groupby and get only the top 15 suppliers and there item counts.

Kindly help me in figuring this out.

andrew_reece · Accepted Answer

IIUC, you can groupby supplier_code and then sum and sort_values. Take the top 15 and you're done.

For example, with:

gb_size = data.groupby(['supplier_code', 'item_code']).size()

Then:

N = 3 # change to 15 for actual data
gb_size.groupby("supplier_code").sum().sort_values(ascending=False).head(N)

Output:

supplier_code
31381089    24
18510135    24
591495      10
dtype: int64

Getting Top N results from pandas groupby

Answers (1)

Related Questions