Reputation: 2920
I have a two columns in dataset:
1) Supplier_code
2) Item_code
I have grouped them using:
data.groupby(['supplier_code', 'item_code']).size()
I get result like this:
supplier_code item_code
591495 127018419 9
547173046 1
3024466 498370473 1
737511044 1
941755892 1
6155238 875189969 1
13672569 53152664 1
430351453 1
573603000 1
634275342 1
18510135 362522958 6
405196476 6
441901484 12
29222428 979575973 1
31381089 28119319 2
468441742 3
648079349 18
941387936 1
I have my top 15 suppliers using:
supCounter = collections.Counter(datalist[3])
supDic = dict(sorted(supCounter.iteritems(), key=operator.itemgetter(1), reverse=True)[:15])
print supDic.keys()
This is my list of top 15 suppliers:
[723223131, 687164888, 594473706, 332379250, 203288669, 604236177,
533512754, 503134099, 982883317, 147405879, 151212120, 737780569, 561901243,
786265866, 79886783]
Now I want to join the two, i.e. groupby and get only the top 15 suppliers and there item counts.
Kindly help me in figuring this out.
Upvotes: 1
Views: 7107
Reputation: 21284
IIUC, you can groupby
supplier_code
and then sum
and sort_values
. Take the top 15 and you're done.
For example, with:
gb_size = data.groupby(['supplier_code', 'item_code']).size()
Then:
N = 3 # change to 15 for actual data
gb_size.groupby("supplier_code").sum().sort_values(ascending=False).head(N)
Output:
supplier_code
31381089 24
18510135 24
591495 10
dtype: int64
Upvotes: 3