Reputation: 5107
I have a data-frame (df
) which takes a snapshot every hour of every day. Below is a partial output:
date_time score distance
12/08/2021 21:00 1.7655 1538061.73
12/08/2021 22:00 1.7520 1531284.36
12/08/2021 23:00 1.7343 1595898.01
13/08/2021 00:00 1.8340 4117927.26
13/08/2021 01:00 1.8689 10397333.06
13/08/2021 02:00 1.8714 6825618.87
13/08/2021 07:00 1.9792 3842945.91
13/08/2021 08:00 1.9695 4876248.15
13/08/2021 09:00 2.0296 7668406.66
13/08/2021 10:00 2.0274 8542998.64
13/08/2021 11:00 2.0374 3705188.96
You can see that between 13/08/2021 02:00
and 13/08/2021 07:00
(i.e. 13/08/2021 03:00
, 13/08/2021 04:00
, 13/08/2021 05:00
and 13/08/2021 06:00
) data is missing.
Is there a way to systematically identify and gaps and backfill with the last previously available data point. So the desired output in my example would be:
date_time score distance
12/08/2021 21:00 1.7655 1538061.73
12/08/2021 22:00 1.7520 1531284.36
12/08/2021 23:00 1.7343 1595898.01
13/08/2021 00:00 1.8340 4117927.26
13/08/2021 01:00 1.8689 10397333.06
13/08/2021 02:00 1.8714 6825618.87
13/08/2021 03:00 1.8714 6825618.87
13/08/2021 04:00 1.8714 6825618.87
13/08/2021 05:00 1.8714 6825618.87
13/08/2021 06:00 1.8714 6825618.87
13/08/2021 07:00 1.9792 3842945.91
13/08/2021 08:00 1.9695 4876248.15
13/08/2021 09:00 2.0296 7668406.66
13/08/2021 10:00 2.0274 8542998.64
13/08/2021 11:00 2.0374 3705188.96
where the score and the distance from 13/08/2021 02:00
has been used to populate the missing data points.
Upvotes: 1
Views: 396
Reputation: 532
If you set the index of the dataframe to use the timestamps, you can use pandas.resample() on your dataframe, setting it to generate a row every hour and forward-filling when null values are encountered...
df.set_index('date_time', drop=True, inplace=True)
df = df.resample('1H', fill_method='ffill')
Upvotes: 3