Reputation: 73
I have a set of hourly data taken from 07-Feb-19 to 17-Feb-19:
t v_amm v_alc v_no2
0 2019-02-07 08:00:00+00:00 0.320000 0.344000 1.612000
1 2019-02-07 09:00:00+00:00 0.322889 0.391778 1.580889
2 2019-02-07 10:00:00+00:00 0.209375 0.325208 2.371250
...
251 2019-02-17 19:00:00+00:00 1.082041 0.652041 0.967143
252 2019-02-17 20:00:00+00:00 0.936923 0.598654 1.048077
253 2019-02-17 21:00:00+00:00 0.652553 0.499574 1.184894
and another similar set of hourly data taken from 01-Mar-19 to 11-Mar-19:
t v_amm v_alc v_no2
0 2019-03-01 00:00:00+00:00 0.428222 0.384444 1.288222
1 2019-03-01 01:00:00+00:00 0.398600 0.359600 1.325800
2 2019-03-01 02:00:00+00:00 0.365682 0.352273 1.360000
...
244 2019-03-11 04:00:00+00:00 0.444048 0.415238 1.265000
245 2019-03-11 05:00:00+00:00 0.590698 0.591395 1.156977
246 2019-03-11 06:00:00+00:00 0.497872 0.465319 1.228298
However, there is no data available between 17-Feb-19 and 01-Mar-19. Hence, I'd like to find the hourly average data based on the day of the week to simulate the missing hourly data between 17-Feb-19 and 01-Mar-19.
In other words, using all the hourly data from the same day of the week and find the average for each hour for that day. Expected output for 17-Feb-19 to 01-Mar-19 is something like:
t v_amm v_alc v_no2
0 2019-02-17 22:00:00+00:00 1.082041 0.652041 0.967143
1 2019-02-17 23:00:00+00:00 0.936923 0.598654 1.048077
2 2019-02-18 00:00:00+00:00 0.652553 0.499574 1.184894
...
250 2019-02-29 21:00:00+00:00 0.428222 0.384444 1.288222
251 2019-02-29 22:00:00+00:00 0.398600 0.359600 1.325800
252 2019-02-29 23:00:00+00:00 0.365682 0.352273 1.360000
Does anyone know how to obtain this in pandas?
Upvotes: 0
Views: 201
Reputation: 493
I'd solve this problem by adding a temporary column "day_of_week". You can generate this value easily with pandas using:
df['day_of_week'] = df.t.dt.dayofweek
(pandas.DatetimeIndex.dayofweek documentation)
Next you would need to generate the average value for each weekday:
daily_mean = df.groupby(by='day_of_week').mean()
pandas.DataFrame.groupby documentation
from here on the next steps depend on which values you need. The daily_means variable has all mean values you need.
The next step would likely be to create the missing rows by generating the date values, generating the corresponding weekday and inserting the generated mean values.
Upvotes: 2