naveendaftari
naveendaftari

Reputation: 81

Define variable iteratively in data table in r

I am trying to find a faster solution to defining a variable iteratively, i.e., the next row of the variable depends on the previous row. For example, suppose I have the following data.table:

tmp <- data.table(type = c("A", "A", "A", "B", "B", "B"), 
                  year = c(2011, 2012, 2013, 2011, 2012, 2013), 
                  alpha = c(1,1,1,2,2,2), 
                  beta = c(3,3,3,4,4,4), 
                  pred = c(1,NA,NA,2,NA, NA))

For each type (A and then B), I want to solve for pred going forward, where pred for type A for the year 2012 is:

pred_2012_A = alpha + beta * pred_2011_A

and the pred for 2013 for type A continues:

pred_2013_A = alpha + beta * pred_2012_A

I have a solution using a for loop to go through type and create a variable to store the previous value and use the "by" command in data table to loop through the year as such:

for(i in c("A", "B")){
  tmp.val <- tmp[type == i & year == 2011]$pred # initial value for type i
  tmp[year > 2011 & type == i, pred := {
    tmp.val <- alpha + beta * tmp.val
  }, by = year]
}

Ultimately, the original data table looks like:

   type year alpha beta pred
1:    A 2011     1    3    1
2:    A 2012     1    3   NA
3:    A 2013     1    3   NA
4:    B 2011     2    4    2
5:    B 2012     2    4   NA
6:    B 2013     2    4   NA

And the updated table looks like:

   type year alpha beta pred
1:    A 2011     1    3    1
2:    A 2012     1    3    4
3:    A 2013     1    3   13
4:    B 2011     2    4    2
5:    B 2012     2    4   10
6:    B 2013     2    4   42

My question here is if there is a faster way to implement this without the for loop. Is there a way to implement this routine in one data table statement that is faster than using the for loop? My real usage has many more types and many more years to compute, so a faster implementation would be greatly appreciated.

Thank you.

Upvotes: 1

Views: 186

Answers (2)

shayaa
shayaa

Reputation: 2797

A bit hacky but bear with me, it only takes two iterations.

df <- read.table(text = "type year alpha beta pred
1:    A 2011     1    3    1
2:    A 2012     1    3   NA
3:    A 2013     1    3   NA
4:    B 2011     2    4    2
5:    B 2012     2    4   NA
6:    B 2013     2    4   NA", header = T)

df2 <- df

while(any(is.na(df2$pred))){
  df2$pred <- df2$alpha + df2$beta*lag(df2$pred)
  df2$pred[which(!is.na(df$pred))] <- df$pred[which(!is.na(df$pred))]
}

The solution is correct

df2
   type year alpha beta pred
1:    A 2011     1    3    1
2:    A 2012     1    3    4
3:    A 2013     1    3   13
4:    B 2011     2    4    2
5:    B 2012     2    4   10
6:    B 2013     2    4   42

Upvotes: 0

Frank
Frank

Reputation: 66819

You can just do the math:

tmp[, pred := pred[1]*beta^(1:.N-1) + alpha*cumsum(c(0, beta[1]^(0:(.N-2)))), by=type]

#    type year alpha beta pred
# 1:    A 2011     1    3    1
# 2:    A 2012     1    3    4
# 3:    A 2013     1    3   13
# 4:    B 2011     2    4    2
# 5:    B 2012     2    4   10
# 6:    B 2013     2    4   42

Comment. In my opinion, the data structure in the OP is flawed. Alpha and beta are clearly attributes of the type, not something that is varying from row to row. It should start with:

typeDT = data.table(
  type=c("A","B"), 
  year.start = 2011L, 
  year.end=2013, 
  a = 1:2, 
  b = 3:4,
  pred0 = 1:2
)

#    type year.start year.end a b pred0
# 1:    A       2011     2013 1 3     1
# 2:    B       2011     2013 2 4     2

With this structure, you could expand to your data set naturally:

typeDT[, {
  year = year.start:year.end
  n    = length(year)
  p    = pred0*b^(0:(n-1)) + a*cumsum(c(0, b^(0:(n-2))))
  .(year = year, pred = p)
}, by=type]

#    type year pred
# 1:    A 2011    1
# 2:    A 2012    4
# 3:    A 2013   13
# 4:    B 2011    2
# 5:    B 2012   10
# 6:    B 2013   42

Upvotes: 3

Related Questions