Stacey
Stacey

Reputation: 5107

Backfill timeseries in pandas with last available data point

I have a data-frame (df) which takes a snapshot every hour of every day. Below is a partial output:

       date_time     score     distance
12/08/2021 21:00    1.7655   1538061.73
12/08/2021 22:00    1.7520   1531284.36
12/08/2021 23:00    1.7343   1595898.01
13/08/2021 00:00    1.8340   4117927.26
13/08/2021 01:00    1.8689  10397333.06
13/08/2021 02:00    1.8714   6825618.87
13/08/2021 07:00    1.9792   3842945.91
13/08/2021 08:00    1.9695   4876248.15
13/08/2021 09:00    2.0296   7668406.66
13/08/2021 10:00    2.0274   8542998.64
13/08/2021 11:00    2.0374   3705188.96

You can see that between 13/08/2021 02:00 and 13/08/2021 07:00 (i.e. 13/08/2021 03:00, 13/08/2021 04:00, 13/08/2021 05:00 and 13/08/2021 06:00) data is missing.

Is there a way to systematically identify and gaps and backfill with the last previously available data point. So the desired output in my example would be:

       date_time     score     distance
12/08/2021 21:00    1.7655   1538061.73
12/08/2021 22:00    1.7520   1531284.36
12/08/2021 23:00    1.7343   1595898.01
13/08/2021 00:00    1.8340   4117927.26
13/08/2021 01:00    1.8689  10397333.06
13/08/2021 02:00    1.8714   6825618.87
13/08/2021 03:00    1.8714   6825618.87
13/08/2021 04:00    1.8714   6825618.87
13/08/2021 05:00    1.8714   6825618.87
13/08/2021 06:00    1.8714   6825618.87
13/08/2021 07:00    1.9792   3842945.91
13/08/2021 08:00    1.9695   4876248.15
13/08/2021 09:00    2.0296   7668406.66
13/08/2021 10:00    2.0274   8542998.64
13/08/2021 11:00    2.0374   3705188.96

where the score and the distance from 13/08/2021 02:00 has been used to populate the missing data points.

Upvotes: 1

Views: 396

Answers (1)

Adrian
Adrian

Reputation: 532

If you set the index of the dataframe to use the timestamps, you can use pandas.resample() on your dataframe, setting it to generate a row every hour and forward-filling when null values are encountered...

df.set_index('date_time', drop=True, inplace=True)
df = df.resample('1H', fill_method='ffill')

enter image description here

Upvotes: 3

Related Questions