Deepak
Deepak

Reputation: 59

How to get all the rows corresponding to maximum values of a column using groupby

For the given dataframe df as:

   Election Yr.  Party   States Votes
0     2000           A       a    50  
1     2000           A       b    30
2     2000           B       a    40
3     2000           B       b    50  
4     2000           C       a    30
5     2000           C       b    40
6     2005           A       a    50  
7     2005           A       b    30
8     2005           B       a    40
9     2005           B       b    50  
10    2005           C       a    30
11    2005           C       b    40

I want to get the Party that got the maximum Votes for a corresponding year. I have used the following code to groupby "Election Year" and "Party" and then .sum() to get the total votes for each party in every year.

df = df.groupby(['Election Yr.', 'Party']).sum()

Now how to get the party with maximum Votes each year? I am unable to get this.

Any support is highly appreciated.

Upvotes: 3

Views: 282

Answers (3)

Arturo Sbr
Arturo Sbr

Reputation: 6333

1. Using inner joins

You can start off with df before doing your first groupby. Then you get the maximum number of votes each year and merge on the year-votes combination to get the party that got the most votes per year.

# Original data
df = pd.DataFrame({'Election Yr.':[2000,2000,2000,2000,2000,2000,2005,2005,2005,2005,2005,2005],
                   'Party':['A','A','B','B','C','C','A','A','B','B','C','C',],
                   'Votes':[50,30,40,50,30,40,50,30,40,50,30,40]})

# Get number of votes per year-party
df = df.groupby(['Election Yr.','Party'])['Votes'].sum().reset_index()

# Get max number of votes per year
max_ = df.groupby('Election Yr.')['Votes'].max().reset_index()

# Merge on key
max_ = max_.merge(df, on=['Election Yr.','Votes'])

# Results
print(max_)

>    Election Yr.  Votes Party
> 0          2000     90     B
> 1          2005     90     B

2. Sorting and keeping first observation

Alternatively, you can sort by votes per year:

df = df.groupby(['Election Yr.','Party'])['Votes'].sum().reset_index()
df = df.sort_values(['Election Yr.','Votes'], ascending=False)
print(df.groupby('Election Yr.').first().reset_index())

print(df)

>    Election Yr. Party  Votes
> 0          2000     B     90
> 1          2005     B     90

Upvotes: 1

not_speshal
not_speshal

Reputation: 23146

Try using a combination of groupby and idxmax:

gb = df.groupby(["Election Yr.", "Party"]).sum()
gb.loc[gb.groupby("Election Yr.")["Votes"].idxmax()].reset_index()
>>> gb
   Election Yr. Party  Votes
0          2000     B     90
1          2005     B     90

Upvotes: 1

Related Questions