tobias sch
tobias sch

Reputation: 359

Get predicted values for next period

please consider following data:

y<- c(2,2,6,3,2,23,5,6,4,23,3,4,3,87,5,7,4,23,3,4,3,87,5,7)
x1<- c(3,4,6,3,3,23,5,6,4,23,6,5,5,1,5,7,2,23,6,5,5,1,5,7)
x2<- c(7,3,6,3,2,2,5,2,2,2,2,2,6,5,4,3,2,3,2,2,6,5,4,3)

type <- c("a","a","a","a","a","a","a","a","b","b","b","b","b","b","b","b","c","c","c","c","c","c","c","c")
generation<- c(1,1,1,1,2,2,3,3,1,2,2,2,3,3,4,4,1,2,2,2,3,3,4,4)
year<-         c(2004,2005,2006,2007,2008,2009,2010,2011,2004,2005,2006,2007,2008,2009,2010,2011,2004,2005,2006,2007,2008,2009,2010,2011)
data        <- data.frame(y,x1,x2,model,generation,year)

I would now make analysis that only take into account each single year and predict on the following. So in essence, this would run several separate analysis, only taking into account the data up to one point in time and then predicting on the next (only the directly next) period.

I tried to set up an example for the three models:

data2004 <- subset(data, year==2004)
data2005 <- subset(data, year==2005)
m1 <- lm(y~x1+x2, data=data2004)
preds <- predict(m1, data2005)

How can I do this automatically? My preferred output would be a predicted value for each type that indicates what the value would have been for each of the values that exist in the following period (the original data has 200 periods).

Thanks in advance, help very much appreciated!

Upvotes: 0

Views: 417

Answers (1)

jld
jld

Reputation: 476

The following may be more like what you want.

uq.year <- sort(unique(dat$year)) ## sorting so that i+1 element is the year after ith element
year <- dat$year
dat$year <- NULL ## we want everything in dat to be either the response or a predictor

model <- rep(c("a", "b", "c"), times = length(year) / 3) ## identifies the separate people per year

predlist <- vector("list", length(uq.year) - 1) ## there is 1 prediction fewer than the number of unique years

for(i in 1:(length(uq.year) - 1))
{
  mod <- lm(y ~ ., data = subset(dat, year == uq.year[i]))
  predlist[[i]] <- predict(mod, subset(dat, subset = year == uq.year[i + 1], select = -y))      
  names(predlist[[i]]) <- model[year == uq.year[i + 1]] ## labeling each prediction
}

The reason that we want dat to only have modeling variables (rather than year, for example) is because then we can easily use the y ~ . notation and avoid having to spell out all of the predictors in the lm call.

Upvotes: 1

Related Questions