ricardo
ricardo

Reputation: 8425

Updating data in lm() calls

Is there is an equivalent to update for the data part of an lm call object?

For example, say i have the following model:

dd = data.frame(y=rnorm(100),x1=rnorm(100))
Model_all <- lm(formula = y ~ x1, data = dd)

Is there a way of operating on the lm object to have the equivalent effect of:

Model_1t50 <- lm(formula = y ~ x1, data = dd[1:50,])

I am trying to construct some psudo out of sample forecast tests, and it would be very convenient to have a single lm object and to simply roll the data.

Upvotes: 3

Views: 3939

Answers (2)

Ben Bolker
Ben Bolker

Reputation: 226047

I'm fairly certain that update actually does what you want!

example(lm)
dat1 <- data.frame(group,weight)
lm1 <- lm(weight ~ group, data=dat1)
dat2 <- data.frame(group,weight=2*weight)
lm2 <- update(lm1,data=dat2)
coef(lm1)
##(Intercept)    groupTrt 
##      5.032      -0.371 
coef(lm2)
## (Intercept)    groupTrt 
##     10.064      -0.742 

If you're hoping for an effiency gain from this, you'll be disappointed -- R just substitutes the new arguments and re-evaluates the call (see the code of update.default). But it does make the code a lot cleaner ...

Upvotes: 8

Michael
Michael

Reputation: 5898

biglm objects can be updated to include more data, but not less. So you could do this in the opposite order, starting with less data and adding more. See http://cran.r-project.org/web/packages/biglm/biglm.pdf

However, I suspect you're interested in parameters estimated for subpopulations (ie if rows 1:50 correspond to level "a" of factor variable factrvar. In this case, you should use interaction in your formula (~factrvar*x1) rather than subsetting to data[1:50,]. Interaction of this type will give different effect estimates for each level of factrvar. This is more efficient than estimating each parameter separately and will constrain any additional parameters (ie, x2 in ~factrvar*x1 + x2) to be the same across values of factrvar--if you estimated the same model multiple times to different subsets, x2 would receive a separate parameter estimate each time.

Upvotes: 1

Related Questions