Reputation: 81
I am trying to find a faster solution to defining a variable iteratively, i.e., the next row of the variable depends on the previous row. For example, suppose I have the following data.table:
tmp <- data.table(type = c("A", "A", "A", "B", "B", "B"),
year = c(2011, 2012, 2013, 2011, 2012, 2013),
alpha = c(1,1,1,2,2,2),
beta = c(3,3,3,4,4,4),
pred = c(1,NA,NA,2,NA, NA))
For each type (A and then B), I want to solve for pred going forward, where pred for type A for the year 2012 is:
pred_2012_A = alpha + beta * pred_2011_A
and the pred for 2013 for type A continues:
pred_2013_A = alpha + beta * pred_2012_A
I have a solution using a for loop to go through type and create a variable to store the previous value and use the "by" command in data table to loop through the year as such:
for(i in c("A", "B")){
tmp.val <- tmp[type == i & year == 2011]$pred # initial value for type i
tmp[year > 2011 & type == i, pred := {
tmp.val <- alpha + beta * tmp.val
}, by = year]
}
Ultimately, the original data table looks like:
type year alpha beta pred
1: A 2011 1 3 1
2: A 2012 1 3 NA
3: A 2013 1 3 NA
4: B 2011 2 4 2
5: B 2012 2 4 NA
6: B 2013 2 4 NA
And the updated table looks like:
type year alpha beta pred
1: A 2011 1 3 1
2: A 2012 1 3 4
3: A 2013 1 3 13
4: B 2011 2 4 2
5: B 2012 2 4 10
6: B 2013 2 4 42
My question here is if there is a faster way to implement this without the for loop. Is there a way to implement this routine in one data table statement that is faster than using the for loop? My real usage has many more types and many more years to compute, so a faster implementation would be greatly appreciated.
Thank you.
Upvotes: 1
Views: 186
Reputation: 2797
A bit hacky but bear with me, it only takes two iterations.
df <- read.table(text = "type year alpha beta pred
1: A 2011 1 3 1
2: A 2012 1 3 NA
3: A 2013 1 3 NA
4: B 2011 2 4 2
5: B 2012 2 4 NA
6: B 2013 2 4 NA", header = T)
df2 <- df
while(any(is.na(df2$pred))){
df2$pred <- df2$alpha + df2$beta*lag(df2$pred)
df2$pred[which(!is.na(df$pred))] <- df$pred[which(!is.na(df$pred))]
}
The solution is correct
df2
type year alpha beta pred
1: A 2011 1 3 1
2: A 2012 1 3 4
3: A 2013 1 3 13
4: B 2011 2 4 2
5: B 2012 2 4 10
6: B 2013 2 4 42
Upvotes: 0
Reputation: 66819
You can just do the math:
tmp[, pred := pred[1]*beta^(1:.N-1) + alpha*cumsum(c(0, beta[1]^(0:(.N-2)))), by=type]
# type year alpha beta pred
# 1: A 2011 1 3 1
# 2: A 2012 1 3 4
# 3: A 2013 1 3 13
# 4: B 2011 2 4 2
# 5: B 2012 2 4 10
# 6: B 2013 2 4 42
Comment. In my opinion, the data structure in the OP is flawed. Alpha and beta are clearly attributes of the type, not something that is varying from row to row. It should start with:
typeDT = data.table(
type=c("A","B"),
year.start = 2011L,
year.end=2013,
a = 1:2,
b = 3:4,
pred0 = 1:2
)
# type year.start year.end a b pred0
# 1: A 2011 2013 1 3 1
# 2: B 2011 2013 2 4 2
With this structure, you could expand to your data set naturally:
typeDT[, {
year = year.start:year.end
n = length(year)
p = pred0*b^(0:(n-1)) + a*cumsum(c(0, b^(0:(n-2))))
.(year = year, pred = p)
}, by=type]
# type year pred
# 1: A 2011 1
# 2: A 2012 4
# 3: A 2013 13
# 4: B 2011 2
# 5: B 2012 10
# 6: B 2013 42
Upvotes: 3