Reputation: 59
I have a dataframe df
as:
Election Year Votes Vote % Party Region
0 2000 42289 29.40 Janata Dal (United) A
1 2000 27618 19.20 Rashtriya Janata Dal A
2 2000 20886 14.50 Bahujan Samaj Party B
3 2000 17747 12.40 Congress B
4 2000 14047 19.80 Independent C
5 2000 17047 10.80 JLS C
6 2005 8358 15.80 Janvadi Party A
7 2005 4428 13.10 Independent A
8 2005 1647 1.20 Independent B
9 2005 1610 11.10 Independent B
10 2005 1334 15.06 Nationalist C
11 2005 1834 18.06 NJM C
12 2010 21114 20.80 Independent A
13 2010 1042 10.5 Bharatiya Janta Dal A
14 2010 835 0.60 Independent B
15 2010 14305 15.50 Independent B
16 2010 22211 17.70 Congress C
16 2010 20011 14.70 INC C
How can I get the list of the regions that has two or more parties getting more than 10% vote % every election year?
I have used the following code to sort "Vote %" in descending order after grouping by "Election year" and "Region" and to then compare the top 2 Vote% every year, but it is giving an error.
df1 = df.groupby(['Election Year','Region'])sort_values('Vote %', ascending = False).reset_index()
How to correct the error as I want to get the top 2 "Party" of each region in each year after the sorting and then see if they get more than 10% vote every election year?
Desired output:
Election Year Region Vote %
2000 A 29.40
2000 A 19.40
2000 C 19.80
2000 C 10.80
2005 A 15.80
2005 A 13.10
2005 C 15.06
2005 C 18.06
2010 A 20.80
2010 A 10.5
2010 C 17.70
2010 C 14.70
Output contains only regions having more than 10% vote every year and Election year and region name in sorted in ascending order. So, here only Region "A" and "C" will be there in the output.
Upvotes: 0
Views: 92
Reputation: 2243
First sort values w.r.t "Vote %" using sort_values()
and then groupby
"Election Year" and "Region" and finally get top 2 rows using head(2)
df.sort_values(['Vote %'],ascending=False).groupby(['Election Year','Region']).head(2).reset_index(drop=True)
Upvotes: 1