Reputation: 1596
I want to filter a dataframe based on two conditions on two different columns. In the example below, I want to filter the dataframe df
to contain rows such that it contains uid
s with value counts for the val
column greater than 4 is more than 2.
df = pd.DataFrame({'uid':[1,1,1,2,2,3,3,4,4,4],'iid':[11,12,13,12,13,13,14,14,11,12], 'val':[3,4,5,3,5,4,5,4,3,4]})
For this dataframe, my output should be
df
uid iid val
0 1 11 3
1 1 12 4
2 1 13 5
5 3 13 4
6 3 14 5
7 4 14 4
8 4 11 3
9 4 12 4
Here, I filtered out the uid
2 becuase number of rows with uid == 2
and val >= 4
is less than 2. I want to keep only uid
rows for which number of val
with values greater than 4 is greater than or equal to 2.
Upvotes: 1
Views: 54
Reputation: 29635
you need groupby.transform
with sum
once check where val is greater or equal ge
than 4. and check that the result is ge
to use it as a boolean filter on df.
print (df[df['val'].ge(4).groupby(df['uid']).transform(sum).ge(2)])
uid iid val
0 1 11 3
1 1 12 4
2 1 13 5
5 3 13 4
6 3 14 5
7 4 14 4
8 4 11 3
9 4 12 4
EDIT: another way to avoid groupby.transform
is to loc
the rows where val is ge
than 4 and the column uid, use value_counts
on it and get True where ge
2. then map
back to the uid column to create the boolean filter on df. same result and potentially faster.
df[df['uid'].map(df.loc[df['val'].ge(4), 'uid'].value_counts().ge(2))]
Upvotes: 2