Outcast
Outcast

Reputation: 5117

Filter dataframe based on the quantile per group of values

Let's suppose that I have a dataframe like that:

import pandas as pd
df = pd.DataFrame({'col1':['A','A', 'A', 'B','B'], 'col2':[2, 4, 6, 3, 4]})

I want to keep from it only the rows which have values at col2 which are less than the x-th quantile of the values for each of the groups of values of col1 separately.

For example for the 60-th percentile then the dataframe should look like that:

  col1  col2
0    A     2
1    A     4
2    B     3

How can I do this efficiently in pandas?

Upvotes: 1

Views: 581

Answers (1)

BENY
BENY

Reputation: 323226

We have transform with quantile

df[df.col2.lt(df.groupby('col1').col2.transform(lambda x : x.quantile(0.6)))]
  col1  col2
0    A     2
1    A     4
3    B     3

Upvotes: 3

Related Questions