Reputation: 4501
Say I have the following:
>>> numpy.random.seed(42)
>>> df = pandas.DataFrame(numpy.random.randint(0, 100, 19), columns=['val'], index=pandas.date_range('2021-03-01', '2021-03-04', freq='4H'))
>>> df
val
2021-03-01 00:00:00 51
2021-03-01 04:00:00 92
2021-03-01 08:00:00 14
2021-03-01 12:00:00 71
2021-03-01 16:00:00 60
2021-03-01 20:00:00 20
2021-03-02 00:00:00 82
2021-03-02 04:00:00 86
2021-03-02 08:00:00 74
2021-03-02 12:00:00 74
2021-03-02 16:00:00 87
2021-03-02 20:00:00 99
2021-03-03 00:00:00 23
2021-03-03 04:00:00 2
2021-03-03 08:00:00 21
2021-03-03 12:00:00 52
2021-03-03 16:00:00 1
2021-03-03 20:00:00 87
2021-03-04 00:00:00 29
>>> df.groupby(pandas.Grouper(freq='1D')).quantile(0.95, interpolation='higher')
val
2021-03-01 92
2021-03-02 99
2021-03-03 87
2021-03-04 29
How can I also get the indices where quantiles are located within each group? I.e. my desired output is:
val idx
2021-03-01 92 2021-03-01 04:00:00
2021-03-02 99 2021-03-02 20:00:00
2021-03-03 87 2021-03-03 20:00:00
2021-03-04 29 2021-03-04 00:00:00
Upvotes: 2
Views: 459
Reputation: 59549
Instead of quantile calculate the rank
within each group and figure out which values are >=
your quantile (since you use interpolate='higher'
). Then sort the DataFrame, keep only rows above your quantile and take the first
within group. Assigning a column as the index brings this along.
m = df.resample('D')['val'].rank(method='dense', pct=True).ge(0.95)
df1 = df.assign(index=df.index)[m].sort_values('val')
df1.groupby(df1.index.normalize()).first()
val index
2021-03-01 92 2021-03-01 04:00:00
2021-03-02 99 2021-03-02 20:00:00
2021-03-03 87 2021-03-03 20:00:00
2021-03-04 29 2021-03-04 00:00:00
Upvotes: 2
Reputation: 150745
One option is to use groupby().transform
:
q95 = (df.groupby(pd.Grouper(freq='1D'))['val']
.transform('quantile', q=0.95, interpolation='higher')
)
df[df['val']== q95]
Output:
val
2021-03-01 04:00:00 92
2021-03-02 20:00:00 99
2021-03-03 20:00:00 87
2021-03-04 00:00:00 29
Upvotes: 1