Finding average data per hour based on the day of the week to simulate data for missing days

Question

I have a set of hourly data taken from 07-Feb-19 to 17-Feb-19:

                             t     v_amm     v_alc     v_no2
0    2019-02-07 08:00:00+00:00  0.320000  0.344000  1.612000
1    2019-02-07 09:00:00+00:00  0.322889  0.391778  1.580889
2    2019-02-07 10:00:00+00:00  0.209375  0.325208  2.371250
...
251  2019-02-17 19:00:00+00:00  1.082041  0.652041  0.967143
252  2019-02-17 20:00:00+00:00  0.936923  0.598654  1.048077
253  2019-02-17 21:00:00+00:00  0.652553  0.499574  1.184894

and another similar set of hourly data taken from 01-Mar-19 to 11-Mar-19:

                            t     v_amm     v_alc     v_no2
0   2019-03-01 00:00:00+00:00  0.428222  0.384444  1.288222
1   2019-03-01 01:00:00+00:00  0.398600  0.359600  1.325800
2   2019-03-01 02:00:00+00:00  0.365682  0.352273  1.360000
...
244 2019-03-11 04:00:00+00:00  0.444048  0.415238  1.265000
245 2019-03-11 05:00:00+00:00  0.590698  0.591395  1.156977
246 2019-03-11 06:00:00+00:00  0.497872  0.465319  1.228298

However, there is no data available between 17-Feb-19 and 01-Mar-19. Hence, I'd like to find the hourly average data based on the day of the week to simulate the missing hourly data between 17-Feb-19 and 01-Mar-19.

In other words, using all the hourly data from the same day of the week and find the average for each hour for that day. Expected output for 17-Feb-19 to 01-Mar-19 is something like:

                             t     v_amm     v_alc     v_no2
0    2019-02-17 22:00:00+00:00  1.082041  0.652041  0.967143
1    2019-02-17 23:00:00+00:00  0.936923  0.598654  1.048077
2    2019-02-18 00:00:00+00:00  0.652553  0.499574  1.184894
...
250  2019-02-29 21:00:00+00:00  0.428222  0.384444  1.288222
251  2019-02-29 22:00:00+00:00  0.398600  0.359600  1.325800
252  2019-02-29 23:00:00+00:00  0.365682  0.352273  1.360000

Does anyone know how to obtain this in pandas?

SerAlejo · Accepted Answer

I'd solve this problem by adding a temporary column "day_of_week". You can generate this value easily with pandas using:

df['day_of_week'] = df.t.dt.dayofweek

(pandas.DatetimeIndex.dayofweek documentation)

Next you would need to generate the average value for each weekday:

daily_mean = df.groupby(by='day_of_week').mean()

pandas.DataFrame.groupby documentation

from here on the next steps depend on which values you need. The daily_means variable has all mean values you need.

The next step would likely be to create the missing rows by generating the date values, generating the corresponding weekday and inserting the generated mean values.

Finding average data per hour based on the day of the week to simulate data for missing days

Answers (1)

Related Questions