Reputation: 171
I want to calculate the maximum value for each year and show the sector and that value. For example, from the screenshot, I would like to display: 2010: Telecom 781 2011: Tech 973
I have tried using: df.groupby(['Year', 'Sector'])['Revenue'].max()
but this does not give me the name of Sector which has the highest value.
Upvotes: 2
Views: 35
Reputation: 59579
Also .sort_values
+ .tail
, grouping on just year. Data from @Scott Boston
df.sort_values('Revenue').groupby('Year').tail(1)
Output:
Sector Year Revenue
9 Heath Care 2014 224
3 Construction 2013 423
1 Tech 2011 466
12 Financial Service 2012 838
5 Telecom 2010 843
Upvotes: 2
Reputation: 153550
Try using idxmax
and loc
:
df.loc[df.groupby(['Sector','Year'])['Revenue'].idxmax()]
MVCE:
import pandas as pd
import numpy as np
np.random.seed(123)
df = pd.DataFrame({'Sector':['Telecom','Tech','Financial Service','Construction','Heath Care']*3,
'Year':[2010,2011,2012,2013,2014]*3,
'Revenue':np.random.randint(101,999,15)})
df.loc[df.groupby(['Sector','Year'])['Revenue'].idxmax()]
Output:
Sector Year Revenue
3 Construction 2013 423
12 Financial Service 2012 838
9 Heath Care 2014 224
1 Tech 2011 466
5 Telecom 2010 843
Upvotes: 2