Filtering a dataframe based on the conditional count on a specific column

Question

I have a data frame as given below

df = pd.DataFrame({'a':[1,2,3,4,4,1,2,2,3,1,3,3,4], 'b':[3,2,5,6,4,5,5,6,4,4,2,6,1], 'c':[4,4,5,4,3,5,3,3,3,2,4,3,5]})

df
    a  b  c
0   1  3  4
1   2  2  4
2   3  5  5
3   4  6  4
4   4  4  3
5   1  5  5
6   2  5  3
7   2  6  3
8   3  4  3
9   1  4  2
10  3  2  4
11  3  6  3
12  4  1  5

I want to filter the dataframe to return a data frame containing entries such that for each unique value of the column 'a', the count for column 'c' taking value greater than 3 should be greater than or equal to 2.

In the above example, I want my output dataframe to be

Since the column 'a' with value 2 such that 'c' > 3 is less than 2. I should drop all those such rows. The other rows appear as 'a' = 1 has corresponding count('c' >3) >= 2 and 'a' = 3 has corresponding count('c' > 3) >= 2 and 'a' = 4 has count('c' >3) >= 2

jezrael · Accepted Answer

Compare column c for greater values by Series.gt, then count values by GroupBy.transform and sum (Trues values are processing like 1) and last filter by Series.ge for greater or equal in boolean indexing:

df = df[df['c'].gt(3).groupby(df['a']).transform('sum').ge(2)]
print (df)
    a  b  c
0   1  3  4
2   3  5  5
3   4  6  4
4   4  4  3
5   1  5  5
8   3  4  3
9   1  4  2
10  3  2  4
11  3  6  3
12  4  1  5

Filtering a dataframe based on the conditional count on a specific column

Answers (1)

Related Questions