Reputation: 171

Summarize values in panda data frames

I want to calculate the maximum value for each year and show the sector and that value. For example, from the screenshot, I would like to display: 2010: Telecom 781 2011: Tech 973

I have tried using: df.groupby(['Year', 'Sector'])['Revenue'].max()

but this does not give me the name of Sector which has the highest value.

Upvotes: 2

Answers (2)

ALollz

Reputation: 59579

Also .sort_values + .tail, grouping on just year. Data from @Scott Boston

df.sort_values('Revenue').groupby('Year').tail(1)

Output:

               Sector  Year  Revenue
9          Heath Care  2014      224
3        Construction  2013      423
1                Tech  2011      466
12  Financial Service  2012      838
5             Telecom  2010      843

Upvotes: 2

Scott Boston

Reputation: 153550

Try using idxmax and loc:

df.loc[df.groupby(['Sector','Year'])['Revenue'].idxmax()]

MVCE:

import pandas as pd
import numpy as np

np.random.seed(123)
df = pd.DataFrame({'Sector':['Telecom','Tech','Financial Service','Construction','Heath Care']*3,
                   'Year':[2010,2011,2012,2013,2014]*3,
                   'Revenue':np.random.randint(101,999,15)})

df.loc[df.groupby(['Sector','Year'])['Revenue'].idxmax()]

Output:

               Sector  Year  Revenue
3        Construction  2013      423
12  Financial Service  2012      838
9          Heath Care  2014      224
1                Tech  2011      466
5             Telecom  2010      843

Upvotes: 2

Summarize values in panda data frames

Answers (2)

Related Questions