Selecting the rows that have observation for all of the years python

Question

I have dataset for students in a University, I wanna keep the observations for those students that did not drop out i.e. have observations for all for years For instance:

Name        Year
Jacop       2010
Jacop       2011
Jacop       2012
Jacop       2013
Nina        2008
Nina        2009
Nina        2010

I need something like count the values by name, if it is smaller than 4, drop. How to do it ?

jezrael · Accepted Answer

I think you need filter:

df = df.groupby('Name').filter(lambda x: len(x) >= 4)
print (df)
    Name  Year
0  Jacop  2010
1  Jacop  2011
2  Jacop  2012
3  Jacop  2013

Another solution with transform and boolean indexing:

df = df[df.groupby('Name')['Name'].transform('size') >= 4]
print (df)
    Name  Year
0  Jacop  2010
1  Jacop  2011
2  Jacop  2012
3  Jacop  2013

Selecting the rows that have observation for all of the years python

Answers (1)

Related Questions