Reputation: 10011
For a quarterly dataframe like this:
date gdp rate
0 2003/3/1 523.82 0.1
1 2003/6/1 1172.83 0.2
2 2003/9/1 1882.48 0.4
3 2003/12/1 3585.72 0.1
4 2004/3/1 706.77 0.2
I want to fill missing months and back fill gdp
and rate
for each month, and get a dataframe like this:
date gdp rate
0 2003-03 523.82 0.1
1 2003-04 1172.83 0.2
2 2003-05 1172.83 0.2
3 2003-06 1172.83 0.2
4 2003-07 1882.48 0.4
5 2003-08 1882.48 0.4
6 2003-09 1882.48 0.4
7 2003-10 3585.72 0.1
8 2003-11 3585.72 0.1
9 2003-12 3585.72 0.1
10 2004-01 706.77 0.2
11 2004-02 706.77 0.2
12 2004-03 706.77 0.2
How can I do that in Pandas? Thanks.
Upvotes: 4
Views: 1208
Reputation: 862591
Another solution is create DatetimeIndex
, then use DataFrame.asfreq
with method='bfill'
and MS
for start of month and last convert to periods by DataFrame.to_period
:
df['date'] = pd.to_datetime(df['date'])
df = df.sort_values(by=['date'], ascending=[True])
df.set_index('date', inplace=True)
df = df.asfreq('MS', method='bfill').to_period('M').reset_index()
print (df)
date gdp rate
0 2003-03 523.82 0.1
1 2003-04 1172.83 0.2
2 2003-05 1172.83 0.2
3 2003-06 1172.83 0.2
4 2003-07 1882.48 0.4
5 2003-08 1882.48 0.4
6 2003-09 1882.48 0.4
7 2003-10 3585.72 0.1
8 2003-11 3585.72 0.1
9 2003-12 3585.72 0.1
10 2004-01 706.77 0.2
11 2004-02 706.77 0.2
12 2004-03 706.77 0.2
Upvotes: 2
Reputation: 10011
This works:
import pandas as pd
df['date'] = pd.to_datetime(df['date']).dt.to_period('M')
# df['date'] = pd.to_datetime(df['date'], format='%Y/%m/%d')
df = df.sort_values(by=['date'], ascending=[True])
df.set_index('date', inplace=True)
df = df.resample('M').bfill().reset_index()
print(df)
output:
date gdp rate
0 2003-03 523.82 0.1
1 2003-04 1172.83 0.2
2 2003-05 1172.83 0.2
3 2003-06 1172.83 0.2
4 2003-07 1882.48 0.4
5 2003-08 1882.48 0.4
6 2003-09 1882.48 0.4
7 2003-10 3585.72 0.1
8 2003-11 3585.72 0.1
9 2003-12 3585.72 0.1
10 2004-01 706.77 0.2
11 2004-02 706.77 0.2
12 2004-03 706.77 0.2
Upvotes: 0