How to extend a pandas dataframe by repeating the last row and incrementing the index by one year at the same time

Question

I have the following dataframe:

import numpy as np
import pandas as pd
dates = pd.date_range('1/1/2014', periods=4)
df = pd.DataFrame(np.eye(4, 4), index=dates, columns=['A', 'B', 'C', 'D'])
print(df)


            A    B    C    D
2014-01-01  1.0  0.0  0.0  0.0
2014-01-02  0.0  1.0  0.0  0.0
2014-01-03  0.0  0.0  1.0  0.0
2014-01-04  0.0  0.0  0.0  1.0

I am extending the dataframe with the last row as follows:

for i in range(3):
    df = df.append(df[-1:])
print(df)

             A    B    C    D
2014-01-01  1.0  0.0  0.0  0.0
2014-01-02  0.0  1.0  0.0  0.0
2014-01-03  0.0  0.0  1.0  0.0
2014-01-04  0.0  0.0  0.0  1.0
2014-01-04  0.0  0.0  0.0  1.0
2014-01-04  0.0  0.0  0.0  1.0
2014-01-04  0.0  0.0  0.0  1.0

However, I would like to also increment the index by a year at the same time. Any idea on how to do that?

expected result:

             A    B    C    D
2014-01-01  1.0  0.0  0.0  0.0
2014-01-02  0.0  1.0  0.0  0.0
2014-01-03  0.0  0.0  1.0  0.0
2014-01-04  0.0  0.0  0.0  1.0
2015-01-04  0.0  0.0  0.0  1.0
2016-01-04  0.0  0.0  0.0  1.0
2017-01-04  0.0  0.0  0.0  1.0

Many thanks,

Mabel Villalba · Accepted Answer

In a few lines:

rows_to_add = 10

new_dates = pd.DatetimeIndex([df.index[-1] + pd.DateOffset(years=y)
                               for y in range(rows_to_add)])

df.reindex(df.index.union(new_dates).unique().sort_values()).ffill()

              A    B    C    D
2014-01-01  1.0  0.0  0.0  0.0
2014-01-02  0.0  1.0  0.0  0.0
2014-01-03  0.0  0.0  1.0  0.0
2014-01-04  0.0  0.0  0.0  1.0
2015-01-04  0.0  0.0  0.0  1.0
2016-01-04  0.0  0.0  0.0  1.0
2017-01-04  0.0  0.0  0.0  1.0
2018-01-04  0.0  0.0  0.0  1.0
2019-01-04  0.0  0.0  0.0  1.0
2020-01-04  0.0  0.0  0.0  1.0
2021-01-04  0.0  0.0  0.0  1.0
2022-01-04  0.0  0.0  0.0  1.0
2023-01-04  0.0  0.0  0.0  1.0

Explained

You can create the new rows by doing:

rows_to_add = 10

new_dates = pd.DatetimeIndex([df.index[-1] + pd.DateOffset(years=y)
                               for y in range(rows_to_add)])

DatetimeIndex(['2014-01-04', '2015-01-04', '2016-01-04', '2017-01-04',
               '2018-01-04', '2019-01-04', '2020-01-04', '2021-01-04',
               '2022-01-04', '2023-01-04'],
              dtype='datetime64[ns]', freq=None)

And then add these dates to the original dates (keeping unique dates and sorting the index):

new_index = df.index.union(new_dates).unique().sort_values()

DatetimeIndex(['2014-01-01', '2014-01-02', '2014-01-03', '2014-01-04',
               '2015-01-04', '2016-01-04', '2017-01-04', '2018-01-04',
               '2019-01-04', '2020-01-04', '2021-01-04', '2022-01-04',
               '2023-01-04'],
              dtype='datetime64[ns]', freq=None)

And then reindex the original dataframe, filling the new rows with the values in the last row:

df.reindex(new_index).ffill()

              A    B    C    D
2014-01-01  1.0  0.0  0.0  0.0
2014-01-02  0.0  1.0  0.0  0.0
2014-01-03  0.0  0.0  1.0  0.0
2014-01-04  0.0  0.0  0.0  1.0
2015-01-04  0.0  0.0  0.0  1.0
2016-01-04  0.0  0.0  0.0  1.0
2017-01-04  0.0  0.0  0.0  1.0
2018-01-04  0.0  0.0  0.0  1.0
2019-01-04  0.0  0.0  0.0  1.0
2020-01-04  0.0  0.0  0.0  1.0
2021-01-04  0.0  0.0  0.0  1.0
2022-01-04  0.0  0.0  0.0  1.0
2023-01-04  0.0  0.0  0.0  1.0

How to extend a pandas dataframe by repeating the last row and incrementing the index by one year at the same time

Answers (2)

Related Questions