D_usv
D_usv

Reputation: 433

nlargest on groupby with multiindex and multiple agg column

Stuggling to apply .nlargest() to my groupedby data in order to only show the largest 10 by gross revenue per index[0]

Groupedby data looks like this:

Data currently

when I run:

grp_data.n_largest(10,'GrossRevenue_GBP')

doesn't seem to be working for me, full code snippet is below:

tmean = lambda x :stats.trim_mean(x, 0.1)

data = data.loc[(data['YYYY'] == 2016)&(data['New_category_ID'] != 0)]

grp_data = data.groupby(['New_category','CDI_CUS_NM'])['GrossRevenue_GBP',
'OrderCount',
'% Rev', 
'MOVC_GBP', 
'Average order size']
.aggregate({'GrossRevenue_GBP':np.sum, 'OrderCount':np.sum,'% Rev': np.sum,'MOVC_GBP': tmean ,'Average order size': tmean })
.nlargest(10,'GrossRevenue_GBP')


grp_data['Country'] = 'EU'


key1 = grp_data.index.labels[0]
key2 = grp_data['GrossRevenue_GBP'].rank(ascending=False)
sorter = np.lexsort((key2, key1))

grp_data = grp_data.take(sorter)


grp_data = grp_data[['% Rev','GrossRevenue_GBP', 'MOVC_GBP','Average order size','OrderCount','Country']]

Would really appreciate some help.

Thanks,

Upvotes: 2

Views: 1819

Answers (1)

jezrael
jezrael

Reputation: 862701

I think you need groupby first Multiindex level and apply function with nlargest:

grp_data = data.groupby(['New_category','CDI_CUS_NM']) 
               .aggregate({'GrossRevenue_GBP':np.sum, 
                           'OrderCount':np.sum,
                           '% Rev': np.sum,
                           'MOVC_GBP': tmean ,
                           'Average order size': tmean })

df = grp_data.groupby('New_category')
             .apply(lambda x: x.nlargest(1,'GrossRevenue_GBP'))
             .reset_index(level=0, drop=True)

Upvotes: 1

Related Questions