Enrico
Enrico

Reputation: 77

Unbalanced Panel Data: Calculation of index based on the first Year of observation

I have once again a question with respect to efficiently conduct some calculation without loops since working with a large data base.

The underlying unbalanced panel data set takes the form of df1 below. As can be seen, there are individuals (ID) observed for different years. Sometimes also with gab years (see ID 4):

library(data.table)
 df1 = data.table( Year = c(2000, 2001,2002, 2003, 2000, 2001, 2000,
 2001,2002, 2003, 2004, 2001,2003,2004),
                  ID = c(1,1,1,1,2,2,3,3,3,3,3,4,4,4),
                  V1 = c(10, 9, 8,11, 2,3, 16,12,15,16,20, 8,7,9),
                  V2 = c(2,3,2,4,0.5,1, 4,6,5,8,9, 8,10,11)
                  )

I would like to create a variable V3 in which for each ID V1/V2 is calculated for the first year of observation. For the following years of observation this value is maintained or copied. See below the desired output in df2:

 df2 = data.table( Year = c(2000, 2001,2002, 2003, 2000, 2001, 2000,
                               2001,2002, 2003, 2004,2001,2003,2004),
                   ID = c(1,1,1,1,2,2,3,3,3,3,3,4,4,4),
                   V1 = c(10, 9, 8,11, 2,3, 12,16,15,16,20, 8,7,9),
                   V2 = c(2,3,2,4, 0.5,1, 4,6,5,8,9 ,8,10,11),
                   V3 = c(5,5,5,5,4,4,3,3,3,3,3, 1,1,1   )
                   )

Upvotes: 0

Views: 98

Answers (1)

Sotos
Sotos

Reputation: 51592

You can use first from data.table to only do the calculation of the first Year (assuming your data is ordered), i.e.

library(data.table)

df1[, v3 := first(V1)/first(V2), by = ID][]

NOTE: You have some discrepancies with your data (row 8) hence the result is not identical to what you posted

Upvotes: 1

Related Questions