Dpk
Dpk

Reputation: 15

Get a column value based on the certain number of values of other column

My dataframe df is:

data = {'Election Year':['2000', '2000','2000','2000','2000','2000','2000','2000','2000','2005','2005','2005','2005','2005','2005','2005','2005','2005'],
    'Votes':[50, 100, 70, 26, 180, 100, 120, 46, 80, 129, 46, 95, 60, 23, 95, 16, 65, 35], 
    'Party': ['A', 'B', 'C', 'A', 'B', 'C','A', 'B', 'C','A', 'B', 'C','A', 'B', 'C','A', 'B', 'C'],
    'Region': ['a', 'a', 'a', 'b', 'b', 'b','c', 'c', 'c','a', 'a', 'a', 'b', 'b', 'b','c', 'c', 'c']}
df = pd.DataFrame(data)
df

    
    Election Year   Votes   Party   Region
  0   2000           50      A       a
  1   2000           100     B       a
  2   2000           70      C       a
  3   2000           26      A       b
  4   2000           180     B       b
  5   2000           100     C       b 
  6   2000           120     A       c
  7   2000           46      B       c
  8   2000           80      C       c
  9   2005           129     A       a
  10  2005           46      B       a
  11  2005           95      C       a
  12  2005           60      A       b
  13  2005           23      B       b
  14  2005           95      C       b
  15  2005           16      A       c
  16  2005           65      B       c
  17  2005           35      C       c

I want to get the regions in which minimum two parties got more than 50 votes in every election?. So the desired output is:

 Region
  a
  b

These are two regions in which miinimum two parties got 50 votes every year.

I tried to sort the "Election Year" and "Votes" and then groupby on election year and Region to then see if top three of each region is getting more than 50 votes. But it is giving a different result.

df1 = df.sort_values(['Election Year','Votes'], ascending=(True,False))
top_3 = df1.groupby(['Election Year', 'Region']).head(3).reset_index()

How can I fix this to get the desired result?

Upvotes: 1

Views: 76

Answers (2)

David
David

Reputation: 8308

You could also try the following:

import pandas as pd
data = {'Election Year':['2000', '2000','2000','2000','2000','2000','2000','2000','2000','2005','2005','2005','2005','2005','2005','2005','2005','2005'],
    'Votes':[50, 100, 70, 26, 180, 100, 120, 46, 80, 129, 46, 95, 60, 23, 95, 16, 65, 35], 
    'Party': ['A', 'B', 'C', 'A', 'B', 'C','A', 'B', 'C','A', 'B', 'C','A', 'B', 'C','A', 'B', 'C'],
    'Region': ['a', 'a', 'a', 'b', 'b', 'b','c', 'c', 'c','a', 'a', 'a', 'b', 'b', 'b','c', 'c', 'c']}
df = pd.DataFrame(data)

x = df.where(df.Votes >= 50).groupby(['Election Year','Region']).count()
x[x.Party >= 2].reset_index().groupby('Region').count()
x = x[x.Party >= 2].reset_index().groupby('Region').count()
x[x['Election Year'] >= 2].index.values

This will give you:

array(['a', 'b'], dtype=object)

Upvotes: 1

Sayandip Dutta
Sayandip Dutta

Reputation: 15872

You can try withh groupby and unstack:

>>> ( df.query('Votes >= 50')
        .groupby(['Region', 'Year'])
        .size().unstack('Year')
        .gt(1).all(1).loc[lambda x:x].index )

Index(['a', 'b'], dtype='object', name='Region')

Upvotes: 1

Related Questions