jones5322
jones5322

Reputation: 23

Extract row with maximum value in DataFrameGroupBy

Newbie trying to break my addiction to excel. I have a data set of paid invoices with the vendor and country where it was paid along with the amount. I want know for each vendor, which country they have the greatest invoice amount and what percentage of their total business is in that country. Using this data set I want the result to be:

Desired output

import pandas as pd
import numpy as np
df = pd.DataFrame({'Company' : ['bar','foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'foo', 'bar'],
    'Country' : ['two','one', 'one', 'two', 'three', 'two', 'two', 'one', 'three', 'one'],
    'Amount' : [4, 2, 2, 6, 4, 5, 6, 7, 8, 9],
    'Pct' : [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]})
CoCntry = df.groupby(['Company', 'Country'])
CoCntry.aggregate(np.sum)

After looking at multiple examples including: Extract row with max value and Getting max value using groupby

2: Python : Getting the Row which has the max value in groups using groupby I've gotten as far as creating a DataFrameGroupBy summarizing the invoice data by country. I’m struggling with how to find the max row. After which I must figure out how to calculate the percent. Advice welcome.

Upvotes: 2

Views: 295

Answers (2)

jezrael
jezrael

Reputation: 863361

You can use transform for return Series Pct of summed values per groups by first level Company. Then filter Dataframe by max value per groups with idxmax and last divide Amount column with Series Pct:

g = CoCntry.groupby(level='Company')['Amount']
Pct = g.transform('sum')
print (Pct)
Company  Country
bar      one        25
         three      25
         two        25
foo      one        28
         three      28
         two        28
Name: Amount, dtype: int64

CoCntry  = CoCntry.loc[g.idxmax()]
print (CoCntry)
                 Amount  Pct
Company Country             
bar     one          11    0
foo     two          11    0

CoCntry.Pct = CoCntry.Amount.div(Pct)
print (CoCntry.reset_index())
  Company Country  Amount       Pct
0     bar     one      11  0.440000
1     foo     two      11  0.392857

Similar another solution:

CoCntry = df.groupby(['Company', 'Country']).Amount.sum()
print (CoCntry)
Company  Country
bar      one        11
         three       4
         two        10
foo      one         9
         three       8
         two        11
Name: Amount, dtype: int64

g =  CoCntry.groupby(level='Company')
Pct = g.sum()
print (Pct)
Company
bar    25
foo    28
Name: Amount, dtype: int64

maxCoCntry  = CoCntry.loc[g.idxmax()].to_frame()
maxCoCntry['Pct'] = maxCoCntry.Amount.div(Pct, level=0)
print (maxCoCntry.reset_index())

  Company Country  Amount       Pct
0     bar     one      11  0.440000
1     foo     two      11  0.392857

Upvotes: 2

piRSquared
piRSquared

Reputation: 294506

setup

df = pd.DataFrame({'Company' : ['bar','foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'foo', 'bar'],
    'Country' : ['two','one', 'one', 'two', 'three', 'two', 'two', 'one', 'three', 'one'],
    'Amount' : [4, 2, 2, 6, 4, 5, 6, 7, 8, 9],
    })

solution

# sum total invoice per country per company
comp_by_country = df.groupby(['Company', 'Country']).Amount.sum()

# sum total invoice per company
comp_totals = df.groupby('Company').Amount.sum()

# percent of per company per country invoice relative to company
comp_by_country_pct = comp_by_country.div(comp_totals).rename('Pct')

answer to OP question
Which 'Country' has greatest total invoice for 'Company' and what percentage of that companies total business.

comp_by_country_pct.loc[
    comp_by_country_pct.groupby(level=0).idxmax()
].reset_index()

  Company Country       Pct
0     bar     one  0.440000
1     foo     two  0.392857

Upvotes: 2

Related Questions