[Numpy/Pandas]How can I efficiently create a panel data set from transaction records?

Question

I have data arranged in the following form:

ID,DATE,STATUS
1,6/20/2011,A
1,1/14/2013,B
1,8/1/2016,C
2,3/1/2005,A
2,4/30/2005,B
2,6/30/2010,C
2,8/20/2010,D

I want to convert these transactions into an unbalanced panel with an annual frequency:

ID,YEAR,STATUS
1,2011,A
1,2012,A
1,2013,B
1,2014,B
1,2015,B
1,2016,C
2,2005,B
2,2006,B
2,2007,B
2,2008,B
2,2009,B
2,2010,D

So basically I want an annual series for each ID that spans the first through last date observed for that ID. The status in each year will be the last status observed in the year if there is more than one record for that year, or the last observed status if there is no date in that year.

This is a big dataset, so a good answer needs to use efficient methods provided by numpy/pandas.

[Numpy/Pandas]How can I efficiently create a panel data set from transaction records?

Answers (1)

Related Questions