William Grimes
William Grimes

Reputation: 707

Pandas groupby rolling quantile for group

What I want to do is take a dataset consisting of an x value for every minute over several years, and create a "typical weeks" worth of data so a dataframe with (60*24*7) = 10080 rows. Where each row has some specific quantile/percentile values e.g. [0.1, 0.25, 0.5, 0.75, 0.9]. The quantiles are calculated from the x value for that minute of a week in all the historic dataset as well as five minutes before and five minutes after that minuteA.

So the output would be for every minute in a week, the x values for that minute, and the values for those data, including the minutes before and minutes after in the percentile calculation e.g.

input data:

                                x    dow    hour  minute
               time                                     
2016-10-18 07:55:00     29.500000      0       7      55
2016-10-18 07:56:00     35.000000      0       7      56
2016-10-18 07:57:00     24.666667      0       7      57
2016-10-18 07:58:00     31.833333      0       7      58
2016-10-18 07:59:00     35.500000      0       7      59
2016-10-18 08:00:00     35.500000      0       8      00 
2016-10-18 08:01:00     33.500000      0       8      01
...                     ...          ...     ...     ...
2018-10-18 10:01:00     32.500000      6      10      01
2018-10-18 10:02:00     34.500000      6      10      02
2018-10-18 10:03:00     39.500000      6      10      03

output data:

dow    hour    minute    percentile1    percentile2    percentile3    percentile4    percentile5
3         4        58             25             28             33             44             50
...
...

How could I achieve such a thing in pandas. This seems to be a step in the right direction, but I'm not sure if the rolling includes the minutes before and after as intended:

df.groupby([df.dow, df.hour, df.minute]).rolling(5).apply(pd.quantile, [0.25, 0.30, 0.50])

Upvotes: 1

Views: 995

Answers (1)

BENY
BENY

Reputation: 323226

You can do with

g=df.groupby(['dow', 'hour', 'minute'])['x']

s=pd.concat({y:g.apply(lambda x : x.rolling(5,min_periods=1).quantile(y)) for y in [0.25,0.30,0.50]},1)
s
Out[298]: 
        0.25       0.30       0.50
0  29.500000  29.500000  29.500000
1  35.000000  35.000000  35.000000
2  24.666667  24.666667  24.666667
3  31.833333  31.833333  31.833333
4  35.500000  35.500000  35.500000
5  33.500000  33.500000  33.500000

yourdf=pd.concat([df,s],axis=1)

Upvotes: 1

Related Questions