levant pied
levant pied

Reputation: 4501

Pandas groupby quantiles with indices

Say I have the following:

>>> numpy.random.seed(42)
>>> df = pandas.DataFrame(numpy.random.randint(0, 100, 19), columns=['val'], index=pandas.date_range('2021-03-01', '2021-03-04', freq='4H'))
>>> df
                     val
2021-03-01 00:00:00   51
2021-03-01 04:00:00   92
2021-03-01 08:00:00   14
2021-03-01 12:00:00   71
2021-03-01 16:00:00   60
2021-03-01 20:00:00   20
2021-03-02 00:00:00   82
2021-03-02 04:00:00   86
2021-03-02 08:00:00   74
2021-03-02 12:00:00   74
2021-03-02 16:00:00   87
2021-03-02 20:00:00   99
2021-03-03 00:00:00   23
2021-03-03 04:00:00    2
2021-03-03 08:00:00   21
2021-03-03 12:00:00   52
2021-03-03 16:00:00    1
2021-03-03 20:00:00   87
2021-03-04 00:00:00   29
>>> df.groupby(pandas.Grouper(freq='1D')).quantile(0.95, interpolation='higher')
            val
2021-03-01   92
2021-03-02   99
2021-03-03   87
2021-03-04   29

How can I also get the indices where quantiles are located within each group? I.e. my desired output is:

            val  idx
2021-03-01   92  2021-03-01 04:00:00
2021-03-02   99  2021-03-02 20:00:00
2021-03-03   87  2021-03-03 20:00:00
2021-03-04   29  2021-03-04 00:00:00

Upvotes: 2

Views: 459

Answers (2)

ALollz
ALollz

Reputation: 59549

Instead of quantile calculate the rank within each group and figure out which values are >= your quantile (since you use interpolate='higher'). Then sort the DataFrame, keep only rows above your quantile and take the first within group. Assigning a column as the index brings this along.

m = df.resample('D')['val'].rank(method='dense', pct=True).ge(0.95)
df1 = df.assign(index=df.index)[m].sort_values('val')

df1.groupby(df1.index.normalize()).first()

            val               index
2021-03-01   92 2021-03-01 04:00:00
2021-03-02   99 2021-03-02 20:00:00
2021-03-03   87 2021-03-03 20:00:00
2021-03-04   29 2021-03-04 00:00:00

Upvotes: 2

Quang Hoang
Quang Hoang

Reputation: 150745

One option is to use groupby().transform:

q95 = (df.groupby(pd.Grouper(freq='1D'))['val']
   .transform('quantile', q=0.95, interpolation='higher')
)
df[df['val']== q95]

Output:

                     val
2021-03-01 04:00:00   92
2021-03-02 20:00:00   99
2021-03-03 20:00:00   87
2021-03-04 00:00:00   29

Upvotes: 1

Related Questions