Obtain most recent value for based on index in a pandas dataframe

Question

On every meeting date, new gdp growth forecasts are made 3 to 4 years into the future. If the gdp growth forecast for a forecast_year is similar to what it was in the last meeting_date, it does not get a new entry in the table.

Is there an easy way to add these missing forecast_year entries for all meeting_dates, quoting the most recent gdp_growth(%) figures available as of the latest meeting_date?

To clarify, here is the input table df_in:

meeting_date	forecast_year	gdp_growth (%)
2007-11-20	2007	2.45
2007-11-20	2008	2.15
2007-11-20	2009	2.50
2007-11-20	2010	2.55
2008-02-20	2008	1.65
2008-02-20	2009	2.40
2008-02-20	2010	2.75
2008-05-21	2008	0.75
2008-05-21	2010	2.85
2008-07-16	2008	1.30
2008-07-16	2010	2.75
2008-11-19	2008	0.15
2008-11-19	2009	0.45
2008-11-19	2011	3.20

Here is the shape of the output table I need to output, df_out. Rows that need to be added are in bold. What pandas manipulations are needed to transform df_in into df_out?

meeting_date	forecast_year	GPD Growth (%)
2007-11-20	2007	2.45
2007-11-20	2008	2.15
2007-11-20	2009	2.50
2007-11-20	2010	2.55
2008-02-20	2008	1.65
2008-02-20	2009	2.40
2008-02-20	2010	2.75
2008-05-21	2008	0.75
2008-05-21	2009	2.40
2008-05-21	2010	2.85
2008-07-16	2008	1.30
2008-07-16	2009	2.40
2008-07-16	2010	2.75
2008-11-19	2008	0.15
2008-11-19	2009	0.45
2008-11-19	2010	2.75
2008-11-19	2011	3.20

Nk03 · Accepted Answer

Another way via pivot:

k = df1.pivot(*df1).ffill().stack().reset_index(name = 'GPD Growth (%)')
df = k[~(pd.to_datetime(k["meeting_date"]).dt.year.gt(k["forecast_year"]))]

OUTPUT:

   meeting_date  forecast_year  GPD Growth (%)
0    2007-11-20           2007            2.45
1    2007-11-20           2008            2.15
2    2007-11-20           2009            2.50
3    2007-11-20           2010            2.55
5    2008-02-20           2008            1.65
6    2008-02-20           2009            2.40
7    2008-02-20           2010            2.75
9    2008-05-21           2008            0.75
10   2008-05-21           2009            2.40
11   2008-05-21           2010            2.85
13   2008-07-16           2008            1.30
14   2008-07-16           2009            2.40
15   2008-07-16           2010            2.75
17   2008-11-19           2008            0.15
18   2008-11-19           2009            0.45
19   2008-11-19           2010            2.75
20   2008-11-19           2011            3.20

Obtain most recent value for based on index in a pandas dataframe

Answers (2)

Related Questions