Reputation: 707
What I want to do is take a dataset consisting of an x
value for every minute over several years, and create a "typical weeks" worth of data so a dataframe with (60*24*7) = 10080 rows. Where each row has some specific quantile/percentile values e.g. [0.1, 0.25, 0.5, 0.75, 0.9]. The quantiles are calculated from the x value for that minute of a week in all the historic dataset as well as five minutes before and five minutes after that minuteA.
So the output would be for every minute in a week, the x values for that minute, and the values for those data, including the minutes before and minutes after in the percentile calculation e.g.
input data:
x dow hour minute
time
2016-10-18 07:55:00 29.500000 0 7 55
2016-10-18 07:56:00 35.000000 0 7 56
2016-10-18 07:57:00 24.666667 0 7 57
2016-10-18 07:58:00 31.833333 0 7 58
2016-10-18 07:59:00 35.500000 0 7 59
2016-10-18 08:00:00 35.500000 0 8 00
2016-10-18 08:01:00 33.500000 0 8 01
... ... ... ... ...
2018-10-18 10:01:00 32.500000 6 10 01
2018-10-18 10:02:00 34.500000 6 10 02
2018-10-18 10:03:00 39.500000 6 10 03
output data:
dow hour minute percentile1 percentile2 percentile3 percentile4 percentile5
3 4 58 25 28 33 44 50
...
...
How could I achieve such a thing in pandas. This seems to be a step in the right direction, but I'm not sure if the rolling includes the minutes before and after as intended:
df.groupby([df.dow, df.hour, df.minute]).rolling(5).apply(pd.quantile, [0.25, 0.30, 0.50])
Upvotes: 1
Views: 995
Reputation: 323226
You can do with
g=df.groupby(['dow', 'hour', 'minute'])['x']
s=pd.concat({y:g.apply(lambda x : x.rolling(5,min_periods=1).quantile(y)) for y in [0.25,0.30,0.50]},1)
s
Out[298]:
0.25 0.30 0.50
0 29.500000 29.500000 29.500000
1 35.000000 35.000000 35.000000
2 24.666667 24.666667 24.666667
3 31.833333 31.833333 31.833333
4 35.500000 35.500000 35.500000
5 33.500000 33.500000 33.500000
yourdf=pd.concat([df,s],axis=1)
Upvotes: 1