A H
A H

Reputation: 2570

Pandas extend datetime index, look up previous year

What is the fastest way to extend a dataframe, by using a previous years' value for next year, e.g.

            col1  col2  col3
2018-04-01     0     0     0
2018-04-02     1     2     3
...
2019-03-31   364   728  1092

becomes

            col1  col2  col3
2019-04-01     0     0     0
2019-04-02     1     2     3
...
2025-03-31   364   728  1092

Here is my test case:

dates = pd.date_range('2018-04-01', '2019-03-31').strftime('%Y-%m-%d')
df    = pd.DataFrame({'col1': range(365),
                     'col2': range(0, 365 * 2, 2),
                     'col3': range(0, 365 * 3, 3)}, index=dates)


assert (extended_df.loc['2019-04-01'] == (0, 0, 0)).all()
assert (extended_df.loc['2019-04-02'] == (1, 2, 3)).all()

Upvotes: 2

Views: 593

Answers (1)

jezrael
jezrael

Reputation: 862641

Solution should be simplier, but because 29. February is necessary first remove this dates for same number of rows for each year and if necessary append them back:

For repeat values is used numpy.tile.

dates = pd.date_range('2018-04-01', '2019-03-31')
df    = pd.DataFrame({'col1': range(365),
                     'col2': range(0, 365 * 2, 2),
                     'col3': range(0, 365 * 3, 3)}, index=dates)

#create range by add 6 years to last value of index
rng=pd.date_range(df.index[0] + pd.offsets.DateOffset(years=1), 
                  df.index[-1] + pd.offsets.DateOffset(years=6))
print (rng)
DatetimeIndex(['2019-04-01', '2019-04-02', '2019-04-03', '2019-04-04',
               '2019-04-05', '2019-04-06', '2019-04-07', '2019-04-08',
               '2019-04-09', '2019-04-10',
               ...
               '2025-03-22', '2025-03-23', '2025-03-24', '2025-03-25',
               '2025-03-26', '2025-03-27', '2025-03-28', '2025-03-29',
               '2025-03-30', '2025-03-31'],
              dtype='datetime64[ns]', length=2192, freq='D')

#filtering 29.2.
mask1 = df.index.strftime('%m-%d') != '02-29'
mask2 = rng.strftime('%m-%d') == '02-29'
rng1 = rng[mask2]
rng2 = rng[~mask2]

#create 29.2. DataFrame
df2 = pd.DataFrame([[0] * len(df.columns)], columns=df.columns, index=rng1)
print (df2)
            col1  col2  col3
2020-02-29     0     0     0
2024-02-29     0     0     0

df = pd.DataFrame(np.tile(df[mask1].values, (6, 1)),
                  columns=df.columns,
                  index=rng2).append(df2)
print (df.head())
            col1  col2  col3
2019-04-01     0     0     0
2019-04-02     1     2     3
2019-04-03     2     4     6
2019-04-04     3     6     9
2019-04-05     4     8    12

print (df.tail())
            col1  col2  col3
2025-03-29   362   724  1086
2025-03-30   363   726  1089
2025-03-31   364   728  1092
2020-02-29     0     0     0
2024-02-29     0     0     0

#last sorting for correct align 29.2 dates
df = df.sort_index()
#print (df)

Upvotes: 3

Related Questions