How to Use Groupby Quantile with Pandas Dataframe

Question

I have a Dataframe df like:

    Name     Date Item  Quantity  Unit Cost  Value
0   Clay  2018 Q1   AA         9       8.97  80.73
1   Clay  2018 Q1   BB         3      12.34  37.02
2   Clay  2018 Q1   CC         4       1.40   5.60
3   Clay  2018 Q1   DD         7       0.22   1.54
4   Clay  2018 Q2   DD         1       0.45   0.45
5   Clay  2018 Q2   AA         4       7.00  28.00
6   Clay  2018 Q2   EE         2       6.40  12.80
7   Clay  2018 Q2   CC         3       2.30   6.90
8    Ray  2018 Q1   AA         8       5.30  42.40
9    Ray  2018 Q1   DD         2       1.60   3.20
10   Ray  2018 Q2   CC         1       9.00   9.00
11   Ray  2018 Q2   DD         4       8.00  32.00
12   Ray  2018 Q1   EE         4       8.00  32.00
13   Ray  2018 Q1   FF         9       4.00  36.00
14   Ray  2018 Q2   GG         1       6.50   6.50
15   Ray  2018 Q2   HH         2       9.10  18.20

And I want to see the top quartile Item or Items by Value for a Name and Date. So output would be something like top_quartile_df:

    Name     Date Item  Quantity  Unit Cost  Value
0   Clay  2018 Q1   AA         9       8.97  80.73
1   Clay  2018 Q2   AA         4       7.00  28.00
2    Ray  2018 Q1   AA         8       5.30  42.40
3    Ray  2018 Q2   DD         4       8.00  32.00

I have tried something like:

df.groupby(['Name', 'Date', 'Item', 'Value']).quantile(.25)

but that gives me an error:

AttributeError: 'DataFrameGroupBy' object has no attribute 'quatile'

So how can I achieve the desired output?

Erfan · Accepted Answer

If I understand you correctly, you want GroupBy with pd.qcut to get the quantiles and then take the rows in the highest quantile:

quantiles = (
    df.groupby(['Name', 'Date'])['Value'].apply(lambda x: pd.qcut(x, 4, labels=[0, 0.25, 0.5, 1]))
)

top_quantile_df = df[quantiles.eq(1)]

    Name     Date Item  Quantity  Unit Cost  Value
0   Alex  2018 Q1   AA         9       8.97  80.73
5   Alex  2018 Q2   AA         4       7.00  28.00
8    Ray  2018 Q1   AA         8       5.30  42.40
11   Ray  2018 Q2   DD         4       8.00  32.00

How to Use Groupby Quantile with Pandas Dataframe

Answers (1)

Related Questions