Pandas groupby quantiles with indices

Question

Say I have the following:

>>> numpy.random.seed(42)
>>> df = pandas.DataFrame(numpy.random.randint(0, 100, 19), columns=['val'], index=pandas.date_range('2021-03-01', '2021-03-04', freq='4H'))
>>> df
                     val
2021-03-01 00:00:00   51
2021-03-01 04:00:00   92
2021-03-01 08:00:00   14
2021-03-01 12:00:00   71
2021-03-01 16:00:00   60
2021-03-01 20:00:00   20
2021-03-02 00:00:00   82
2021-03-02 04:00:00   86
2021-03-02 08:00:00   74
2021-03-02 12:00:00   74
2021-03-02 16:00:00   87
2021-03-02 20:00:00   99
2021-03-03 00:00:00   23
2021-03-03 04:00:00    2
2021-03-03 08:00:00   21
2021-03-03 12:00:00   52
2021-03-03 16:00:00    1
2021-03-03 20:00:00   87
2021-03-04 00:00:00   29
>>> df.groupby(pandas.Grouper(freq='1D')).quantile(0.95, interpolation='higher')
            val
2021-03-01   92
2021-03-02   99
2021-03-03   87
2021-03-04   29

How can I also get the indices where quantiles are located within each group? I.e. my desired output is:

            val  idx
2021-03-01   92  2021-03-01 04:00:00
2021-03-02   99  2021-03-02 20:00:00
2021-03-03   87  2021-03-03 20:00:00
2021-03-04   29  2021-03-04 00:00:00

ALollz · Accepted Answer

Instead of quantile calculate the rank within each group and figure out which values are >= your quantile (since you use interpolate='higher'). Then sort the DataFrame, keep only rows above your quantile and take the first within group. Assigning a column as the index brings this along.

m = df.resample('D')['val'].rank(method='dense', pct=True).ge(0.95)
df1 = df.assign(index=df.index)[m].sort_values('val')

df1.groupby(df1.index.normalize()).first()

            val               index
2021-03-01   92 2021-03-01 04:00:00
2021-03-02   99 2021-03-02 20:00:00
2021-03-03   87 2021-03-03 20:00:00
2021-03-04   29 2021-03-04 00:00:00

Pandas groupby quantiles with indices

Answers (2)

Related Questions