Reputation: 968
I have a Dataframe df
like:
Name Date Item Quantity Unit Cost Value
0 Clay 2018 Q1 AA 9 8.97 80.73
1 Clay 2018 Q1 BB 3 12.34 37.02
2 Clay 2018 Q1 CC 4 1.40 5.60
3 Clay 2018 Q1 DD 7 0.22 1.54
4 Clay 2018 Q2 DD 1 0.45 0.45
5 Clay 2018 Q2 AA 4 7.00 28.00
6 Clay 2018 Q2 EE 2 6.40 12.80
7 Clay 2018 Q2 CC 3 2.30 6.90
8 Ray 2018 Q1 AA 8 5.30 42.40
9 Ray 2018 Q1 DD 2 1.60 3.20
10 Ray 2018 Q2 CC 1 9.00 9.00
11 Ray 2018 Q2 DD 4 8.00 32.00
12 Ray 2018 Q1 EE 4 8.00 32.00
13 Ray 2018 Q1 FF 9 4.00 36.00
14 Ray 2018 Q2 GG 1 6.50 6.50
15 Ray 2018 Q2 HH 2 9.10 18.20
And I want to see the top quartile Item
or Items
by Value
for a Name
and Date
. So output would be something like top_quartile_df
:
Name Date Item Quantity Unit Cost Value
0 Clay 2018 Q1 AA 9 8.97 80.73
1 Clay 2018 Q2 AA 4 7.00 28.00
2 Ray 2018 Q1 AA 8 5.30 42.40
3 Ray 2018 Q2 DD 4 8.00 32.00
I have tried something like:
df.groupby(['Name', 'Date', 'Item', 'Value']).quantile(.25)
but that gives me an error:
AttributeError: 'DataFrameGroupBy' object has no attribute 'quatile'
So how can I achieve the desired output?
Upvotes: 0
Views: 362
Reputation: 42886
If I understand you correctly, you want GroupBy
with pd.qcut
to get the quantiles and then take the rows in the highest quantile:
quantiles = (
df.groupby(['Name', 'Date'])['Value'].apply(lambda x: pd.qcut(x, 4, labels=[0, 0.25, 0.5, 1]))
)
top_quantile_df = df[quantiles.eq(1)]
Name Date Item Quantity Unit Cost Value
0 Alex 2018 Q1 AA 9 8.97 80.73
5 Alex 2018 Q2 AA 4 7.00 28.00
8 Ray 2018 Q1 AA 8 5.30 42.40
11 Ray 2018 Q2 DD 4 8.00 32.00
Upvotes: 1