Reputation: 1596
I have a data frame as given below
df = pd.DataFrame({'a':[1,2,3,4,4,1,2,2,3,1,3,3,4], 'b':[3,2,5,6,4,5,5,6,4,4,2,6,1], 'c':[4,4,5,4,3,5,3,3,3,2,4,3,5]})
df
a b c
0 1 3 4
1 2 2 4
2 3 5 5
3 4 6 4
4 4 4 3
5 1 5 5
6 2 5 3
7 2 6 3
8 3 4 3
9 1 4 2
10 3 2 4
11 3 6 3
12 4 1 5
I want to filter the dataframe to return a data frame containing entries such that for each unique value of the column 'a', the count for column 'c' taking value greater than 3 should be greater than or equal to 2.
In the above example, I want my output dataframe to be
0 1 3 4
2 3 5 5
3 4 6 4
4 4 4 3
5 1 5 5
8 3 4 3
9 1 4 2
10 3 5 4
11 3 6 3
12 4 4 5
Since the column 'a' with value 2 such that 'c' > 3 is less than 2. I should drop all those such rows. The other rows appear as 'a' = 1 has corresponding count('c' >3) >= 2 and 'a' = 3 has corresponding count('c' > 3) >= 2 and 'a' = 4 has count('c' >3) >= 2
Upvotes: 1
Views: 99
Reputation: 862581
Compare column c
for greater values by Series.gt
, then count values by GroupBy.transform
and sum
(True
s values are processing like 1
) and last filter by Series.ge
for greater or equal in boolean indexing
:
df = df[df['c'].gt(3).groupby(df['a']).transform('sum').ge(2)]
print (df)
a b c
0 1 3 4
2 3 5 5
3 4 6 4
4 4 4 3
5 1 5 5
8 3 4 3
9 1 4 2
10 3 2 4
11 3 6 3
12 4 1 5
Upvotes: 4