andrey
andrey

Reputation: 2059

Function to forecast in time series

Working in R. I would like to forecast time series of prevalences using the initial values and a set of transition parameters. For the data of the following structure

 cohort <- c(1980,1981,1982)
 A00 <- c(.15, .2,.4)
 B00 <- c(.25, .3, .4) 
 C00 <-c(.6, .5,.2)
 Tab<-c(.6,.5,.4)
 Tac<-c(.2,.25,.35)
 ds <- data.frame(cohort,A00,B00,C00,Tab,Tac)
 print (ds)

  cohort  A00  B00 C00 Tab  Tac
1   1980 0.15 0.25 0.6 0.6 0.20
2   1981 0.20 0.30 0.5 0.5 0.25
3   1982 0.40 0.40 0.2 0.4 0.35

Initial values in columns A00, B00, and C00 represent relevant size of each group (A,B,C) at time t=00. They add up to 1 across the row (A00+B00+C00=1). Parameters Tab and Tac are used to predict the prevalence at time t+1 using some mathematical model, for example

A01   = df$A00 -df$Tab +df$Tac.

The function to compute predicted values at time t+1 is

 forecast<- function( df ) {
  dsResult <- data.frame(
    cohort= df$cohort,
    A01   = df$A00 -df$Tab +df$Tac ,    
    B01   = df$B00 -df$Tab +df$Tac,    
    C01  =  df$C00 -df$Tab +df$Tac    

  )
  dsResult<- merge(df,dsResult,by="cohort")
  return( dsResult)
}
new<-forecast(ds)

and produces the following result

  cohort  A00  B00 C00 Tab  Tac   A01   B01  C01
1   1980 0.15 0.25 0.6 0.6 0.20 -0.25 -0.15 0.20
2   1981 0.20 0.30 0.5 0.5 0.25 -0.05  0.05 0.25
3   1982 0.40 0.40 0.2 0.4 0.35  0.35  0.35 0.15

I would very much appreciate your help in learning how to write a loop to cycle through a desired number of years of the forecast( for t in 1:7, for instance). Thanks in advance!

Upvotes: 1

Views: 598

Answers (1)

wibeasley
wibeasley

Reputation: 5287

Initially I'd like to make two suggestions that might make the problem easier to code. First, revise the data schema so that each year is a unique row, and each group is a unique column. Second, since the cohorts are treated mathematically independent of each other, keep them separate for now, at least until the code's kernel is built. Put a loop around this later that cycles through them. In the first block of code, there are two matrices, one with observed data, and one that will collect the predicted data.

yearCount <- 7 #Declare the number of time points.
groupCount <- 3 #Declare the number of groups.

#Create fake data that sum to 1 across rows/times.
ob <- matrix(runif(yearCount*groupCount), ncol=groupCount)
ob <- ob / apply(ob, 1, function( x ){ return( sum(x) )})

#Establish a container to old the predicted values.
pred <- matrix(NA_real_, ncol=groupCount, nrow=yearCount)

t12<-.5; t13<-.2; t11<-1-t12-t13 #Transition parameters from group 1
t21<-.2; t23<-.4; t22<-1-t21-t23 #Transition parameters from group 2
t31<-.3; t32<-.1; t33<-1-t31-t32 #Transition parameters from group 3

for( i in 2:yearCount ) {
  pred[i, 1] <- ob[i-1, 1]*t11 + ob[i-1, 2]*t21 + ob[i-1, 3]*t31
  pred[i, 2] <- ob[i-1, 1]*t12 + ob[i-1, 2]*t22 + ob[i-1, 3]*t32
  pred[i, 3] <- ob[i-1, 1]*t13 + ob[i-1, 2]*t23 + ob[i-1, 3]*t33
}

#Calculate the squared errors
ss <- (pred[-1, ] - ob[-1, ])^2 #Ignore the first year of data

Inside the loop, you probably notice the familiar structure of matrix multiplication. Each row can be slightly condensed using inner products (ie, one row of the ob matrix is multiplied, then summed with a one "column" of the ts. I'm using t12 slightly differently than the Tab in your post; this is the probability of transitioning from group 1 to group 2 at a given time point.

#Create transition parameters that sum to 1 across rows/groups.
tt <-  matrix(runif(groupCount*groupCount), ncol=groupCount)
tt <- tt / apply(tt, 1, function( x ){ return( sum(x) )})

Pretend the tt matrix was defined earlier, instead of the separate variables of t11,...,t33.

for( i in 2:yearCount ) {
  pred[i, 1] <- ob[i-1, ] %*% tt[, 1] 
  pred[i, 2] <- ob[i-1, ] %*% tt[, 2]
  pred[i, 3] <- ob[i-1, ] %*% tt[, 3]
}

The loop's contents are slightly cleaner than when each element pair was explicitly multiplied and summed. But we don't have to treat each row/column pair individually. All three columns of the ob matrix can be operated on by all three columns of the tt matrix simultaneously:

for( i in 2:yearCount ) {
  pred[i, ] <- ob[i-1, ] %*% tt
}

This should be much quicker than even the previous version, because R's internal memory system isn't recreating the matrix three times for each row -only once per row. To reduce this to once per matrix, use the apply function, and then transpose the matrix if that suits your purpose. Finally, notice that the rows represent different years than pred (ie, row i-1 here is the same as row i in pred).

predictionWIthExtraYear <- t(apply(ob, 1, FUN=function(row){row %*% tt}))

To accommodate cohorts, perhaps you could declare a list with three elements (for the 1980, 1981, and 1982 cohorts). Each element would be a unique ob matrix. And create a second list for a unique pred matrix. Or maybe use three dimensional matrices (but that may be more taxing when R recreates the memory with the replacement function).

Upvotes: 2

Related Questions