Reputation: 359
please consider following data:
y<- c(2,2,6,3,2,23,5,6,4,23,3,4,3,87,5,7,4,23,3,4,3,87,5,7)
x1<- c(3,4,6,3,3,23,5,6,4,23,6,5,5,1,5,7,2,23,6,5,5,1,5,7)
x2<- c(7,3,6,3,2,2,5,2,2,2,2,2,6,5,4,3,2,3,2,2,6,5,4,3)
type <- c("a","a","a","a","a","a","a","a","b","b","b","b","b","b","b","b","c","c","c","c","c","c","c","c")
generation<- c(1,1,1,1,2,2,3,3,1,2,2,2,3,3,4,4,1,2,2,2,3,3,4,4)
year<- c(2004,2005,2006,2007,2008,2009,2010,2011,2004,2005,2006,2007,2008,2009,2010,2011,2004,2005,2006,2007,2008,2009,2010,2011)
data <- data.frame(y,x1,x2,model,generation,year)
I would now make analysis that only take into account each single year and predict on the following. So in essence, this would run several separate analysis, only taking into account the data up to one point in time and then predicting on the next (only the directly next) period.
I tried to set up an example for the three models:
data2004 <- subset(data, year==2004)
data2005 <- subset(data, year==2005)
m1 <- lm(y~x1+x2, data=data2004)
preds <- predict(m1, data2005)
How can I do this automatically? My preferred output would be a predicted value for each type that indicates what the value would have been for each of the values that exist in the following period (the original data has 200 periods).
Thanks in advance, help very much appreciated!
Upvotes: 0
Views: 417
Reputation: 476
The following may be more like what you want.
uq.year <- sort(unique(dat$year)) ## sorting so that i+1 element is the year after ith element
year <- dat$year
dat$year <- NULL ## we want everything in dat to be either the response or a predictor
model <- rep(c("a", "b", "c"), times = length(year) / 3) ## identifies the separate people per year
predlist <- vector("list", length(uq.year) - 1) ## there is 1 prediction fewer than the number of unique years
for(i in 1:(length(uq.year) - 1))
{
mod <- lm(y ~ ., data = subset(dat, year == uq.year[i]))
predlist[[i]] <- predict(mod, subset(dat, subset = year == uq.year[i + 1], select = -y))
names(predlist[[i]]) <- model[year == uq.year[i + 1]] ## labeling each prediction
}
The reason that we want dat
to only have modeling variables (rather than year
, for example) is because then we can easily use the y ~ .
notation and avoid having to spell out all of the predictors in the lm
call.
Upvotes: 1