Reputation: 556
I have dataset for students in a University, I wanna keep the observations for those students that did not drop out i.e. have observations for all for years For instance:
Name Year
Jacop 2010
Jacop 2011
Jacop 2012
Jacop 2013
Nina 2008
Nina 2009
Nina 2010
I need something like count the values by name, if it is smaller than 4, drop. How to do it ?
Upvotes: 1
Views: 233
Reputation: 862601
I think you need filter
:
df = df.groupby('Name').filter(lambda x: len(x) >= 4)
print (df)
Name Year
0 Jacop 2010
1 Jacop 2011
2 Jacop 2012
3 Jacop 2013
Another solution with transform
and boolean indexing
:
df = df[df.groupby('Name')['Name'].transform('size') >= 4]
print (df)
Name Year
0 Jacop 2010
1 Jacop 2011
2 Jacop 2012
3 Jacop 2013
Upvotes: 2