How to extrapolate a periodic time serie in Pandas?

Question

In Python 3.5, Pandas 20, say I have a one year periodic time serie :

import pandas as pd
import numpy as np

start_date = pd.to_datetime("2015-01-01T01:00:00.000Z", infer_datetime_format=True)
end_date = pd.to_datetime("2015-12-31T23:00:00.000Z", infer_datetime_format=True)
index = pd.DatetimeIndex(start=start_date,
                         freq="60min",
                         end=end_date)
time = np.array((index - start_date)/ np.timedelta64(1, 'h'), dtype=int)
df = pd.DataFrame(index=index)
df["foo"] = np.sin( 2 * np.pi * time / len(time))

df.plot()

I want to do some periodic extrapolation of the time serie for a new index. I.e with :

new_start_date = pd.to_datetime("2017-01-01T01:00:00.000Z", infer_datetime_format=True)
new_end_date = pd.to_datetime("2019-12-31T23:00:00.000Z", infer_datetime_format=True)
new_index = pd.DatetimeIndex(start=new_start_date,
                             freq="60min",
                             end=new_end_date)

I would like to use some kind of extrapolate_periodic method to get:

# DO NOT RUN
new_df = df.extrapolate_periodic(index=new_index)
# END DO NOT RUN

new_df.plot()

What is the best way do such a thing in pandas?

How can I define a periodicity and get data from a new index easily?

Little Bobby Tables · Accepted Answer

I think I have what you are looking for, though it is not a simple pandas method.

Carrying on directly from where you left off,

def extrapolate_periodic(df, new_index):
    df_right = df.groupby([df.index.dayofyear, df.index.hour]).mean()
    df_left = pd.DataFrame({'new_index': new_index}).set_index('new_index')
    df_left = df_left.assign(dayofyear=lambda x: x.index.dayofyear,
                             hour=lambda x: x.index.hour)
    df = (pd.merge(df_left, df_right, left_on=['dayofyear', 'hour'],
                   right_index=True, suffixes=('', '_y'))
            .drop(['dayofyear', 'hour'], axis=1))
    return df.sort_index()

new_df = extrapolate_periodic(df, new_index)
# or as a method style
# new_df = df.pipe(extrapolate_periodic, new_index)

new_df.plot()

If you have more that a years worth of data it will take the mean of each duplicated day-hour. Here mean could be changed for last if you wanted just the most recent reading.

This will not work if you do not have a full years worth of data but you could fix this by adding in a reindex to complete the year and then using interpolate with a polynomial feature to fill in the missing foo column.

How to extrapolate a periodic time serie in Pandas?

Answers (2)

Related Questions