Reputation: 619
I have the following dataframe:
import numpy as np
import pandas as pd
dates = pd.date_range('1/1/2014', periods=4)
df = pd.DataFrame(np.eye(4, 4), index=dates, columns=['A', 'B', 'C', 'D'])
print(df)
A B C D
2014-01-01 1.0 0.0 0.0 0.0
2014-01-02 0.0 1.0 0.0 0.0
2014-01-03 0.0 0.0 1.0 0.0
2014-01-04 0.0 0.0 0.0 1.0
I am extending the dataframe with the last row as follows:
for i in range(3):
df = df.append(df[-1:])
print(df)
A B C D
2014-01-01 1.0 0.0 0.0 0.0
2014-01-02 0.0 1.0 0.0 0.0
2014-01-03 0.0 0.0 1.0 0.0
2014-01-04 0.0 0.0 0.0 1.0
2014-01-04 0.0 0.0 0.0 1.0
2014-01-04 0.0 0.0 0.0 1.0
2014-01-04 0.0 0.0 0.0 1.0
However, I would like to also increment the index by a year at the same time. Any idea on how to do that?
expected result:
A B C D
2014-01-01 1.0 0.0 0.0 0.0
2014-01-02 0.0 1.0 0.0 0.0
2014-01-03 0.0 0.0 1.0 0.0
2014-01-04 0.0 0.0 0.0 1.0
2015-01-04 0.0 0.0 0.0 1.0
2016-01-04 0.0 0.0 0.0 1.0
2017-01-04 0.0 0.0 0.0 1.0
Many thanks,
Upvotes: 1
Views: 1058
Reputation: 2598
In a few lines:
rows_to_add = 10
new_dates = pd.DatetimeIndex([df.index[-1] + pd.DateOffset(years=y)
for y in range(rows_to_add)])
df.reindex(df.index.union(new_dates).unique().sort_values()).ffill()
A B C D
2014-01-01 1.0 0.0 0.0 0.0
2014-01-02 0.0 1.0 0.0 0.0
2014-01-03 0.0 0.0 1.0 0.0
2014-01-04 0.0 0.0 0.0 1.0
2015-01-04 0.0 0.0 0.0 1.0
2016-01-04 0.0 0.0 0.0 1.0
2017-01-04 0.0 0.0 0.0 1.0
2018-01-04 0.0 0.0 0.0 1.0
2019-01-04 0.0 0.0 0.0 1.0
2020-01-04 0.0 0.0 0.0 1.0
2021-01-04 0.0 0.0 0.0 1.0
2022-01-04 0.0 0.0 0.0 1.0
2023-01-04 0.0 0.0 0.0 1.0
Explained
You can create the new rows by doing:
rows_to_add = 10
new_dates = pd.DatetimeIndex([df.index[-1] + pd.DateOffset(years=y)
for y in range(rows_to_add)])
DatetimeIndex(['2014-01-04', '2015-01-04', '2016-01-04', '2017-01-04',
'2018-01-04', '2019-01-04', '2020-01-04', '2021-01-04',
'2022-01-04', '2023-01-04'],
dtype='datetime64[ns]', freq=None)
And then add these dates to the original dates (keeping unique dates and sorting the index):
new_index = df.index.union(new_dates).unique().sort_values()
DatetimeIndex(['2014-01-01', '2014-01-02', '2014-01-03', '2014-01-04',
'2015-01-04', '2016-01-04', '2017-01-04', '2018-01-04',
'2019-01-04', '2020-01-04', '2021-01-04', '2022-01-04',
'2023-01-04'],
dtype='datetime64[ns]', freq=None)
And then reindex the original dataframe, filling the new rows with the values in the last row:
df.reindex(new_index).ffill()
A B C D
2014-01-01 1.0 0.0 0.0 0.0
2014-01-02 0.0 1.0 0.0 0.0
2014-01-03 0.0 0.0 1.0 0.0
2014-01-04 0.0 0.0 0.0 1.0
2015-01-04 0.0 0.0 0.0 1.0
2016-01-04 0.0 0.0 0.0 1.0
2017-01-04 0.0 0.0 0.0 1.0
2018-01-04 0.0 0.0 0.0 1.0
2019-01-04 0.0 0.0 0.0 1.0
2020-01-04 0.0 0.0 0.0 1.0
2021-01-04 0.0 0.0 0.0 1.0
2022-01-04 0.0 0.0 0.0 1.0
2023-01-04 0.0 0.0 0.0 1.0
Upvotes: 2
Reputation: 4792
Use:
df[-1:].index
DatetimeIndex(['2014-01-04'], dtype='datetime64[ns]', freq='D')
dates_new = pd.date_range(df[-1:].index.values[0], periods=4, freq = pd.DateOffset(years=1))
#set periods to number of rows you want to add + 1
dates_new
DatetimeIndex(['2014-01-04', '2015-01-04', '2016-01-04', '2017-01-04'], dtype='datetime64[ns]', freq='<DateOffset: years=1>')
df_new = pd.DataFrame(index=dates_new, columns=['A', 'B', 'C', 'D'])
df_new = df_new.apply(lambda x: df.loc[pd.datetime(2014, 1, 4)], axis = 1)
df_new
A B C D
2014-01-04 0.0 0.0 0.0 1.0
2015-01-04 0.0 0.0 0.0 1.0
2016-01-04 0.0 0.0 0.0 1.0
2017-01-04 0.0 0.0 0.0 1.0
df = df.append(df_new)
A B C D
2014-01-01 1.0 0.0 0.0 0.0
2014-01-02 0.0 1.0 0.0 0.0
2014-01-03 0.0 0.0 1.0 0.0
2014-01-04 0.0 0.0 0.0 1.0
2014-01-04 0.0 0.0 0.0 1.0
2015-01-04 0.0 0.0 0.0 1.0
2016-01-04 0.0 0.0 0.0 1.0
2017-01-04 0.0 0.0 0.0 1.0
Kinda feels like a hack
You can remove the duplicate index using:
df = df[~df.index.duplicated(keep='first')]
Upvotes: 0