vwrobel
vwrobel

Reputation: 1736

How to extrapolate a periodic time serie in Pandas?

In Python 3.5, Pandas 20, say I have a one year periodic time serie :

import pandas as pd
import numpy as np

start_date = pd.to_datetime("2015-01-01T01:00:00.000Z", infer_datetime_format=True)
end_date = pd.to_datetime("2015-12-31T23:00:00.000Z", infer_datetime_format=True)
index = pd.DatetimeIndex(start=start_date,
                         freq="60min",
                         end=end_date)
time = np.array((index - start_date)/ np.timedelta64(1, 'h'), dtype=int)
df = pd.DataFrame(index=index)
df["foo"] = np.sin( 2 * np.pi * time / len(time))

df.plot()

enter image description here

I want to do some periodic extrapolation of the time serie for a new index. I.e with :

new_start_date = pd.to_datetime("2017-01-01T01:00:00.000Z", infer_datetime_format=True)
new_end_date = pd.to_datetime("2019-12-31T23:00:00.000Z", infer_datetime_format=True)
new_index = pd.DatetimeIndex(start=new_start_date,
                             freq="60min",
                             end=new_end_date)

I would like to use some kind of extrapolate_periodic method to get:

# DO NOT RUN
new_df = df.extrapolate_periodic(index=new_index)
# END DO NOT RUN

new_df.plot()

enter image description here

What is the best way do such a thing in pandas?

How can I define a periodicity and get data from a new index easily?

Upvotes: 0

Views: 597

Answers (2)

vwrobel
vwrobel

Reputation: 1736

Here is some code I've used to solve my problem. The asumption is that the initial serie corresponds to a period of data.

def extrapolate_periodic(df, new_index):
    index = df.index
    start_date = np.min(index)
    end_date = np.max(index)
    period = np.array((end_date - start_date) / np.timedelta64(1, 'h'), dtype=int)
    time = np.array((new_index - start_date)/ np.timedelta64(1, 'h'), dtype=int)
    new_df = pd.DataFrame(index=new_index)
    for col in list(df.columns):
        new_df[col] = np.array(df[col].iloc[time % period])
    return new_df

Upvotes: 1

Little Bobby Tables
Little Bobby Tables

Reputation: 4744

I think I have what you are looking for, though it is not a simple pandas method.

Carrying on directly from where you left off,

def extrapolate_periodic(df, new_index):
    df_right = df.groupby([df.index.dayofyear, df.index.hour]).mean()
    df_left = pd.DataFrame({'new_index': new_index}).set_index('new_index')
    df_left = df_left.assign(dayofyear=lambda x: x.index.dayofyear,
                             hour=lambda x: x.index.hour)
    df = (pd.merge(df_left, df_right, left_on=['dayofyear', 'hour'],
                   right_index=True, suffixes=('', '_y'))
            .drop(['dayofyear', 'hour'], axis=1))
    return df.sort_index()

new_df = extrapolate_periodic(df, new_index)
# or as a method style
# new_df = df.pipe(extrapolate_periodic, new_index)

new_df.plot()

date extrapolation plot

If you have more that a years worth of data it will take the mean of each duplicated day-hour. Here mean could be changed for last if you wanted just the most recent reading.

This will not work if you do not have a full years worth of data but you could fix this by adding in a reindex to complete the year and then using interpolate with a polynomial feature to fill in the missing foo column.

Upvotes: 1

Related Questions