cpage
cpage

Reputation: 39

Sort doubly-grouped df by value column corresponding to only one of the groups

I have a pd.DataFrame that is the result of .groupby(['Product', 'Salesperson']).sum(). I would now like to sort the Product column in order of Sales by Product (not Sales by Product and Salesperson). Then inside each Product group sort by each Salesperson's Sales.

Here is my starting df:

sample

And here is my desired answer a1 with some notes to clarify the ordering process:

desired-answer

Below is my sample df and my desired answer a1 with a simple assertion test.

import pandas as pd
from pandas.util.testing import assert_frame_equal
import numpy as np

s1 = {'Product': {0: 'Soap',
                  1: 'Soap',
                  2: 'Pencil',
                  3: 'Paper',
                  4: 'Paper',
                  5: 'Bags',
                  6: 'Bags'},
      'Salesperson': {0: 'Jack',
                      1: 'Jill',
                      2: 'Jill',
                      3: 'Jack',
                      4: 'Barry',
                      5: 'Barry',
                      6: 'Jack'},
      'Sales': {0: 40, 1: 20, 2: 500, 3: 50, 4: 10, 5: 450, 6: 100}}

a1 = {'Product': {0: 'Bags',
                  1: 'Bags',
                  2: 'Pencil',
                  3: 'Paper',
                  4: 'Paper',
                  5: 'Soap',
                  6: 'Soap'},
      'Salesperson': {0: 'Barry',
                      1: 'Jack',
                      2: 'Jill',
                      3: 'Jack',
                      4: 'Barry',
                      5: 'Jack',
                      6: 'Jill'},
      'Sales': {0: 450, 1: 100, 2: 500, 3: 50, 4: 10, 5: 40, 6: 20}}

df = pd.DataFrame(s1).set_index(['Product', 'Salesperson']) # sample
a1 = pd.DataFrame(a1).set_index(['Product', 'Salesperson']) # desired answer

print(df)
print(a1)

def my_sort(df):
    raise NotImplementedError

my_answer = my_sort(df)

assert_frame_equal(my_answer, a1)

Upvotes: 1

Views: 46

Answers (1)

Mykola Zotko
Mykola Zotko

Reputation: 17884

You can groupby by 'Product' and create the dummy columns 'sum' and 'max', which you can use by sorting:

g = df.groupby('Product')['Sales']
df['sum'] = g.transform('sum')
df['max'] = g.transform('max')
df.sort_values(['sum', 'max', 'Sales'], ascending=False)\
.drop(['sum', 'max'], axis=1)

Output:

                     Sales
Product Salesperson       
Bags    Barry          450
        Jack           100
Pencil  Jill           500
Paper   Jack            50
        Barry           10
Soap    Jack            40
        Jill            20

Upvotes: 2

Related Questions