Reputation: 2487
I'm working on pulling financial data, in which some is formatted in yearly and other is monthly. My model will need all of it monthly, therefore I need that same yearly value repeated for each month. I've been using this stack post and trying to adapt the code to my data.
Here is my dataframe:
df.head()
date ticker value
0 1999-12-31 ECB/RA6 1.0
1 2000-12-31 ECB/RA6 4.0
2 2001-12-31 ECB/RA6 2.0
3 2002-12-31 ECB/RA6 3.0
4 2003-12-31 ECB/RA6 2.0
Here is my desired output first 5 rows:
date ticker value
0 1999-12-31 ECB/RA6 1.0
1 2000-01-31 ECB/RA6 4.0
2 2000-02-28 ECB/RA6 4.0
3 2000-13-31 ECB/RA6 4.0
4 2000-04-30 ECB/RA6 4.0
And my code:
df['date'] = pd.to_datetime(df['date'], format='%Y-%m')
df = df.pivot(index='date', columns='ticker')
start_date = df.index.min() - pd.DateOffset(day=1)
end_date = df.index.max() + pd.DateOffset(day=31)
dates = pd.date_range(start_date, end_date, freq='M')
dates.name = 'date'
df = df.reindex(dates, method='ffill')
df = df.stack('ticker')
df = df.sortlevel(level=1)
df = df.reset_index()
However, it is not repeating the months as expected
Upvotes: 8
Views: 4502
Reputation: 294198
You want resample
First, you need to set the index so that resample
will work. Then you backfill and reset the index.
df.set_index('date').resample('M').bfill().reset_index()
date ticker value
0 1999-12-31 ECB/RA6 1.0
1 2000-01-31 ECB/RA6 4.0
2 2000-02-29 ECB/RA6 4.0
3 2000-03-31 ECB/RA6 4.0
4 2000-04-30 ECB/RA6 4.0
5 2000-05-31 ECB/RA6 4.0
6 2000-06-30 ECB/RA6 4.0
7 2000-07-31 ECB/RA6 4.0
8 2000-08-31 ECB/RA6 4.0
9 2000-09-30 ECB/RA6 4.0
10 2000-10-31 ECB/RA6 4.0
11 2000-11-30 ECB/RA6 4.0
12 2000-12-31 ECB/RA6 4.0
13 2001-01-31 ECB/RA6 2.0
14 2001-02-28 ECB/RA6 2.0
15 2001-03-31 ECB/RA6 2.0
...
To handle this per ticker
df.set_index('date').groupby('ticker', group_keys=False) \
.resample('M').bfill().reset_index()
Upvotes: 7