Shew
Shew

Reputation: 1596

dataframe with two conditions on two different columns

I want to filter a dataframe based on two conditions on two different columns. In the example below, I want to filter the dataframe df to contain rows such that it contains uids with value counts for the val column greater than 4 is more than 2.

df = pd.DataFrame({'uid':[1,1,1,2,2,3,3,4,4,4],'iid':[11,12,13,12,13,13,14,14,11,12], 'val':[3,4,5,3,5,4,5,4,3,4]})

For this dataframe, my output should be

 df
   uid  iid  val
0    1   11    3
1    1   12    4
2    1   13    5
5    3   13    4
6    3   14    5
7    4   14    4
8    4   11    3
9    4   12    4

Here, I filtered out the uid 2 becuase number of rows with uid == 2 and val >= 4 is less than 2. I want to keep only uid rows for which number of val with values greater than 4 is greater than or equal to 2.

Upvotes: 1

Views: 54

Answers (1)

Ben.T
Ben.T

Reputation: 29635

you need groupby.transform with sum once check where val is greater or equal ge than 4. and check that the result is ge to use it as a boolean filter on df.

print (df[df['val'].ge(4).groupby(df['uid']).transform(sum).ge(2)])
   uid  iid  val
0    1   11    3
1    1   12    4
2    1   13    5
5    3   13    4
6    3   14    5
7    4   14    4
8    4   11    3
9    4   12    4

EDIT: another way to avoid groupby.transform is to loc the rows where val is ge than 4 and the column uid, use value_counts on it and get True where ge 2. then map back to the uid column to create the boolean filter on df. same result and potentially faster.

df[df['uid'].map(df.loc[df['val'].ge(4), 'uid'].value_counts().ge(2))]

Upvotes: 2

Related Questions