Reputation: 121
On every meeting date, new gdp growth forecasts are made 3 to 4 years into the future. If the gdp growth forecast for a forecast_year
is similar to what it was in the last meeting_date
, it does not get a new entry in the table.
Is there an easy way to add these missing forecast_year
entries for all meeting_dates
, quoting the most recent gdp_growth(%)
figures available as of the latest meeting_date
?
To clarify, here is the input table df_in
:
meeting_date | forecast_year | gdp_growth (%) |
---|---|---|
2007-11-20 | 2007 | 2.45 |
2007-11-20 | 2008 | 2.15 |
2007-11-20 | 2009 | 2.50 |
2007-11-20 | 2010 | 2.55 |
2008-02-20 | 2008 | 1.65 |
2008-02-20 | 2009 | 2.40 |
2008-02-20 | 2010 | 2.75 |
2008-05-21 | 2008 | 0.75 |
2008-05-21 | 2010 | 2.85 |
2008-07-16 | 2008 | 1.30 |
2008-07-16 | 2010 | 2.75 |
2008-11-19 | 2008 | 0.15 |
2008-11-19 | 2009 | 0.45 |
2008-11-19 | 2011 | 3.20 |
Here is the shape of the output table I need to output, df_out
. Rows that need to be added are in bold. What pandas manipulations are needed to transform df_in
into df_out
?
meeting_date | forecast_year | GPD Growth (%) |
---|---|---|
2007-11-20 | 2007 | 2.45 |
2007-11-20 | 2008 | 2.15 |
2007-11-20 | 2009 | 2.50 |
2007-11-20 | 2010 | 2.55 |
2008-02-20 | 2008 | 1.65 |
2008-02-20 | 2009 | 2.40 |
2008-02-20 | 2010 | 2.75 |
2008-05-21 | 2008 | 0.75 |
2008-05-21 | 2009 | 2.40 |
2008-05-21 | 2010 | 2.85 |
2008-07-16 | 2008 | 1.30 |
2008-07-16 | 2009 | 2.40 |
2008-07-16 | 2010 | 2.75 |
2008-11-19 | 2008 | 0.15 |
2008-11-19 | 2009 | 0.45 |
2008-11-19 | 2010 | 2.75 |
2008-11-19 | 2011 | 3.20 |
Upvotes: 1
Views: 189
Reputation: 195613
Try:
x = (
df.set_index(["meeting_date", "forecast_year"])
.unstack(level=1)
.ffill()
.stack()
.reset_index()
)
# remove rows where meeting_date > forecast_year
x = x[~(pd.to_datetime(x["meeting_date"]).dt.year > x["forecast_year"])]
print(x)
Prints:
meeting_date forecast_year gdp_growth (%)
0 2007-11-20 2007 2.45
1 2007-11-20 2008 2.15
2 2007-11-20 2009 2.50
3 2007-11-20 2010 2.55
5 2008-02-20 2008 1.65
6 2008-02-20 2009 2.40
7 2008-02-20 2010 2.75
9 2008-05-21 2008 0.75
10 2008-05-21 2009 2.40
11 2008-05-21 2010 2.85
13 2008-07-16 2008 1.30
14 2008-07-16 2009 2.40
15 2008-07-16 2010 2.75
17 2008-11-19 2008 0.15
18 2008-11-19 2009 0.45
19 2008-11-19 2010 2.75
20 2008-11-19 2011 3.20
EDIT: Removed MultiIndex.from_product
- not needed
Upvotes: 2
Reputation: 14949
Another way via pivot
:
k = df1.pivot(*df1).ffill().stack().reset_index(name = 'GPD Growth (%)')
df = k[~(pd.to_datetime(k["meeting_date"]).dt.year.gt(k["forecast_year"]))]
OUTPUT:
meeting_date forecast_year GPD Growth (%)
0 2007-11-20 2007 2.45
1 2007-11-20 2008 2.15
2 2007-11-20 2009 2.50
3 2007-11-20 2010 2.55
5 2008-02-20 2008 1.65
6 2008-02-20 2009 2.40
7 2008-02-20 2010 2.75
9 2008-05-21 2008 0.75
10 2008-05-21 2009 2.40
11 2008-05-21 2010 2.85
13 2008-07-16 2008 1.30
14 2008-07-16 2009 2.40
15 2008-07-16 2010 2.75
17 2008-11-19 2008 0.15
18 2008-11-19 2009 0.45
19 2008-11-19 2010 2.75
20 2008-11-19 2011 3.20
Upvotes: 3