Unbalanced Panel Data: Calculation of index based on the first Year of observation

Question

I have once again a question with respect to efficiently conduct some calculation without loops since working with a large data base.

The underlying unbalanced panel data set takes the form of df1 below. As can be seen, there are individuals (ID) observed for different years. Sometimes also with gab years (see ID 4):

library(data.table)
 df1 = data.table( Year = c(2000, 2001,2002, 2003, 2000, 2001, 2000,
 2001,2002, 2003, 2004, 2001,2003,2004),
                  ID = c(1,1,1,1,2,2,3,3,3,3,3,4,4,4),
                  V1 = c(10, 9, 8,11, 2,3, 16,12,15,16,20, 8,7,9),
                  V2 = c(2,3,2,4,0.5,1, 4,6,5,8,9, 8,10,11)
                  )

I would like to create a variable V3 in which for each ID V1/V2 is calculated for the first year of observation. For the following years of observation this value is maintained or copied. See below the desired output in df2:

 df2 = data.table( Year = c(2000, 2001,2002, 2003, 2000, 2001, 2000,
                               2001,2002, 2003, 2004,2001,2003,2004),
                   ID = c(1,1,1,1,2,2,3,3,3,3,3,4,4,4),
                   V1 = c(10, 9, 8,11, 2,3, 12,16,15,16,20, 8,7,9),
                   V2 = c(2,3,2,4, 0.5,1, 4,6,5,8,9 ,8,10,11),
                   V3 = c(5,5,5,5,4,4,3,3,3,3,3, 1,1,1   )
                   )

Sotos · Accepted Answer

You can use first from data.table to only do the calculation of the first Year (assuming your data is ordered), i.e.

library(data.table)

df1[, v3 := first(V1)/first(V2), by = ID][]

NOTE: You have some discrepancies with your data (row 8) hence the result is not identical to what you posted

Unbalanced Panel Data: Calculation of index based on the first Year of observation

Answers (1)

Related Questions