Reputation: 6699
I have a dataframe like this (simplified):
Year ID Value
0 2000 A 0
1 2001 A 1
2 2000 A 2
3 2000 B 3
4 2001 B 4
5 2000 C 5
6 2001 C 6
7 1990 D 7
8 1990 E 8
9 1991 E 9
10 1993 E 10
11 1993 E 11
12 1994 E 12
I'm only interested in the IDs that are present for 3 or more years. I can clumsily step through and test for each ID
for id in list(Set(df['ID'])):
if len(list(Set(df[df['ID']==id]['Year']))) >= 3:
df2 = df2.append( df[df['ID']==id] )
Year ID Value
8 1990 E 8
9 1991 E 9
10 1993 E 10
11 1993 E 11
12 1994 E 12
but it seems like there should be a simpler way.
Upvotes: 2
Views: 126
Reputation: 879421
Use groupby-filter:
(df.groupby(['ID'])
.filter(lambda x: x['Year'].nunique()>=3))
yields
Year ID Value
8 1990 E 8
9 1991 E 9
10 1993 E 10
11 1993 E 11
12 1994 E 12
[5 rows x 3 columns]
Upvotes: 4