Reputation: 433
Stuggling to apply .nlargest() to my groupedby data in order to only show the largest 10 by gross revenue per index[0]
Groupedby data looks like this:
when I run:
grp_data.n_largest(10,'GrossRevenue_GBP')
doesn't seem to be working for me, full code snippet is below:
tmean = lambda x :stats.trim_mean(x, 0.1)
data = data.loc[(data['YYYY'] == 2016)&(data['New_category_ID'] != 0)]
grp_data = data.groupby(['New_category','CDI_CUS_NM'])['GrossRevenue_GBP',
'OrderCount',
'% Rev',
'MOVC_GBP',
'Average order size']
.aggregate({'GrossRevenue_GBP':np.sum, 'OrderCount':np.sum,'% Rev': np.sum,'MOVC_GBP': tmean ,'Average order size': tmean })
.nlargest(10,'GrossRevenue_GBP')
grp_data['Country'] = 'EU'
key1 = grp_data.index.labels[0]
key2 = grp_data['GrossRevenue_GBP'].rank(ascending=False)
sorter = np.lexsort((key2, key1))
grp_data = grp_data.take(sorter)
grp_data = grp_data[['% Rev','GrossRevenue_GBP', 'MOVC_GBP','Average order size','OrderCount','Country']]
Would really appreciate some help.
Thanks,
Upvotes: 2
Views: 1819
Reputation: 862701
I think you need groupby
first Multiindex level and apply function with nlargest
:
grp_data = data.groupby(['New_category','CDI_CUS_NM'])
.aggregate({'GrossRevenue_GBP':np.sum,
'OrderCount':np.sum,
'% Rev': np.sum,
'MOVC_GBP': tmean ,
'Average order size': tmean })
df = grp_data.groupby('New_category')
.apply(lambda x: x.nlargest(1,'GrossRevenue_GBP'))
.reset_index(level=0, drop=True)
Upvotes: 1