Reputation: 221
I have a data frame that looks like this:
CP AID type
1 1 b
1 2 b
1 3 a
2 4 a
2 4 b
3 5 b
3 6 a
3 7 b
I would like to groupby the CP column and filter so it only returns rows where the CP has at least 3 unique 'pairs' from the AID column.
The result should look like this:
CP AID type
1 1 b
1 2 b
1 3 a
3 5 b
3 6 a
3 7 b
Upvotes: 1
Views: 212
Reputation: 42916
You can groupby
in combination with unique
:
m = df.groupby('CP').AID.transform('unique').str.len() >= 3
print(df[m])
CP AID type
0 1 1 b
1 1 2 b
2 1 3 a
5 3 5 b
6 3 6 a
7 3 7 b
Or as RafaelC mentioned in the comments:
m = df.groupby('CP').AID.transform('nunique').ge(3)
print(df[m])
CP AID type
0 1 1 b
1 1 2 b
2 1 3 a
5 3 5 b
6 3 6 a
7 3 7 b
Upvotes: 3
Reputation: 383
You can do that:
count = df1[['CP', 'AID']].groupby('CP').count().reset_index()
df1 = df1[df1['CP'].isin(count.loc[count['AID'] == 3,'CP'].values.tolist())]
Upvotes: 0