iayork
iayork

Reputation: 6699

Pandas: Select based on number of items in a set

I have a dataframe like this (simplified):

    Year ID  Value
0   2000  A      0
1   2001  A      1
2   2000  A      2
3   2000  B      3
4   2001  B      4
5   2000  C      5
6   2001  C      6
7   1990  D      7
8   1990  E      8
9   1991  E      9
10  1993  E     10
11  1993  E     11
12  1994  E     12

I'm only interested in the IDs that are present for 3 or more years. I can clumsily step through and test for each ID

for id in list(Set(df['ID'])):
    if len(list(Set(df[df['ID']==id]['Year']))) >= 3:
            df2 = df2.append( df[df['ID']==id] )

        Year ID  Value
    8   1990  E      8
    9   1991  E      9
    10  1993  E     10
    11  1993  E     11
    12  1994  E     12

but it seems like there should be a simpler way.

Upvotes: 2

Views: 126

Answers (1)

unutbu
unutbu

Reputation: 879421

Use groupby-filter:

(df.groupby(['ID'])
   .filter(lambda x: x['Year'].nunique()>=3))

yields

    Year ID  Value
8   1990  E      8
9   1991  E      9
10  1993  E     10
11  1993  E     11
12  1994  E     12

[5 rows x 3 columns]

Upvotes: 4

Related Questions