Deepak
Deepak

Reputation: 59

Pandas Groupby and sorting columns simultaneously

I have a dataframe dfas:

  Election Year     Votes   Vote %      Party              Region   
0   2000            42289   29.40   Janata Dal (United)     A
1   2000            27618   19.20   Rashtriya Janata Dal    A
2   2000            20886   14.50   Bahujan Samaj Party     B 
3   2000            17747   12.40   Congress                B
4   2000            14047   19.80   Independent             C
5   2000            17047   10.80   JLS                     C
6   2005            8358    15.80   Janvadi Party           A
7   2005            4428    13.10   Independent             A
8   2005            1647    1.20    Independent             B
9   2005            1610    11.10   Independent             B
10  2005            1334    15.06   Nationalist             C
11  2005            1834    18.06   NJM                     C
12  2010            21114   20.80   Independent             A
13  2010            1042    10.5    Bharatiya Janta Dal     A
14  2010            835     0.60    Independent             B
15  2010            14305   15.50   Independent             B
16  2010            22211   17.70   Congress                C
16  2010            20011   14.70   INC                     C

How can I get the list of the regions that has two or more parties getting more than 10% vote % every election year?

I have used the following code to sort "Vote %" in descending order after grouping by "Election year" and "Region" and to then compare the top 2 Vote% every year, but it is giving an error.

df1 = df.groupby(['Election Year','Region'])sort_values('Vote %', ascending = False).reset_index()

How to correct the error as I want to get the top 2 "Party" of each region in each year after the sorting and then see if they get more than 10% vote every election year?

Desired output:

Election Year    Region    Vote %
  2000             A        29.40
  2000             A        19.40
  2000             C        19.80
  2000             C        10.80
  2005             A        15.80
  2005             A        13.10
  2005             C        15.06
  2005             C        18.06
  2010             A        20.80
  2010             A        10.5
  2010             C        17.70
  2010             C        14.70

Output contains only regions having more than 10% vote every year and Election year and region name in sorted in ascending order. So, here only Region "A" and "C" will be there in the output.

Upvotes: 0

Views: 92

Answers (1)

Hamza usman ghani
Hamza usman ghani

Reputation: 2243

First sort values w.r.t "Vote %" using sort_values() and then groupby "Election Year" and "Region" and finally get top 2 rows using head(2)

df.sort_values(['Vote %'],ascending=False).groupby(['Election Year','Region']).head(2).reset_index(drop=True)

Upvotes: 1

Related Questions