Reputation: 15
My dataframe df
is:
data = {'Election Year':['2000', '2000','2000','2000','2000','2000','2000','2000','2000','2005','2005','2005','2005','2005','2005','2005','2005','2005'],
'Votes':[50, 100, 70, 26, 180, 100, 120, 46, 80, 129, 46, 95, 60, 23, 95, 16, 65, 35],
'Party': ['A', 'B', 'C', 'A', 'B', 'C','A', 'B', 'C','A', 'B', 'C','A', 'B', 'C','A', 'B', 'C'],
'Region': ['a', 'a', 'a', 'b', 'b', 'b','c', 'c', 'c','a', 'a', 'a', 'b', 'b', 'b','c', 'c', 'c']}
df = pd.DataFrame(data)
df
Election Year Votes Party Region
0 2000 50 A a
1 2000 100 B a
2 2000 70 C a
3 2000 26 A b
4 2000 180 B b
5 2000 100 C b
6 2000 120 A c
7 2000 46 B c
8 2000 80 C c
9 2005 129 A a
10 2005 46 B a
11 2005 95 C a
12 2005 60 A b
13 2005 23 B b
14 2005 95 C b
15 2005 16 A c
16 2005 65 B c
17 2005 35 C c
I want to get the regions in which minimum two parties got more than 50 votes in every election?. So the desired output is:
Region
a
b
These are two regions in which miinimum two parties got 50 votes every year.
I tried to sort the "Election Year" and "Votes" and then groupby on election year and Region to then see if top three of each region is getting more than 50 votes. But it is giving a different result.
df1 = df.sort_values(['Election Year','Votes'], ascending=(True,False))
top_3 = df1.groupby(['Election Year', 'Region']).head(3).reset_index()
How can I fix this to get the desired result?
Upvotes: 1
Views: 76
Reputation: 8308
You could also try the following:
import pandas as pd
data = {'Election Year':['2000', '2000','2000','2000','2000','2000','2000','2000','2000','2005','2005','2005','2005','2005','2005','2005','2005','2005'],
'Votes':[50, 100, 70, 26, 180, 100, 120, 46, 80, 129, 46, 95, 60, 23, 95, 16, 65, 35],
'Party': ['A', 'B', 'C', 'A', 'B', 'C','A', 'B', 'C','A', 'B', 'C','A', 'B', 'C','A', 'B', 'C'],
'Region': ['a', 'a', 'a', 'b', 'b', 'b','c', 'c', 'c','a', 'a', 'a', 'b', 'b', 'b','c', 'c', 'c']}
df = pd.DataFrame(data)
x = df.where(df.Votes >= 50).groupby(['Election Year','Region']).count()
x[x.Party >= 2].reset_index().groupby('Region').count()
x = x[x.Party >= 2].reset_index().groupby('Region').count()
x[x['Election Year'] >= 2].index.values
This will give you:
array(['a', 'b'], dtype=object)
Upvotes: 1
Reputation: 15872
You can try withh groupby
and unstack
:
>>> ( df.query('Votes >= 50')
.groupby(['Region', 'Year'])
.size().unstack('Year')
.gt(1).all(1).loc[lambda x:x].index )
Index(['a', 'b'], dtype='object', name='Region')
Upvotes: 1