Reputation: 700
Total Pandas noob here, so have mercy please. I have a data sample with yearly entries of the shape pasted below:
{"Country":{"0":"Italy","1":"Italy","2":"Italy","3":"Italy","4":"Italy","5":"Italy","6":"Italy","7":"France","8":"France","9":"France","10":"France","11":"France","12":"France","13":"Spain","14":"Spain","15":"Spain","16":"Spain","17":"Spain","18":"Spain","19":"Spain"},"Year":{"0":2004,"1":2005,"2":2006,"3":2007,"4":2008,"5":2009,"6":2010,"7":2006,"8":2007,"9":2008,"10":2009,"11":2010,"12":2011,"13":2007,"14":2008,"15":2009,"16":2010,"17":2011,"18":2012,"19":2013},"Revenue":{"0":1000,"1":1200,"2":1300,"3":1400,"4":1450,"5":1300,"6":1200,"7":2200,"8":2100,"9":1900,"10":2300,"11":2400,"12":2500,"13":1150,"14":1230,"15":1300,"16":1200,"17":1050,"18":900,"19":950}}
I need a way to filter only the common years for all countries, so for example, 2007, 2008, 2009 and 2010.
I assume I should make a formula and apply it, but I just can't seem to find my way.
Upvotes: 3
Views: 86
Reputation: 59284
Use nunique
twice: get the number of unique countries n
, and filter the years with only the number of unique countries being equal to n
n = df.Country.nunique()
s = df.groupby('Year').Country.nunique().eq(n)
>>> print(s)
Year
2004 False
2005 False
2006 False
2007 True
2008 True
2009 True
2010 True
2011 False
2012 False
2013 False
Name: Country, dtype: bool
To get the years,
>>> print(s[s].index)
[2007, 2008, 2009, 2010]
Can also use set
intersection
>>> set.intersection(*df.groupby('Country').Year.agg(set))
{2007, 2008, 2009, 2010}
Upvotes: 5
Reputation: 51185
Option 1
pivot
+ dropna
df.pivot('Year', 'Country', 'Revenue').dropna().index
Option 2
crosstab
+ all
u = pd.crosstab(df.Year, df.Country)
u[u.all(1)].index
Both produce:
Int64Index([2007, 2008, 2009, 2010], dtype='int64', name='Year')
Upvotes: 3