Reputation: 8425
Is there is an equivalent to update for the data part of an lm call object?
For example, say i have the following model:
dd = data.frame(y=rnorm(100),x1=rnorm(100))
Model_all <- lm(formula = y ~ x1, data = dd)
Is there a way of operating on the lm object to have the equivalent effect of:
Model_1t50 <- lm(formula = y ~ x1, data = dd[1:50,])
I am trying to construct some psudo out of sample forecast tests, and it would be very convenient to have a single lm object and to simply roll the data.
Upvotes: 3
Views: 3939
Reputation: 226047
I'm fairly certain that update
actually does what you want!
example(lm)
dat1 <- data.frame(group,weight)
lm1 <- lm(weight ~ group, data=dat1)
dat2 <- data.frame(group,weight=2*weight)
lm2 <- update(lm1,data=dat2)
coef(lm1)
##(Intercept) groupTrt
## 5.032 -0.371
coef(lm2)
## (Intercept) groupTrt
## 10.064 -0.742
If you're hoping for an effiency gain from this, you'll be disappointed -- R just substitutes the new arguments and re-evaluates the call (see the code of update.default
). But it does make the code a lot cleaner ...
Upvotes: 8
Reputation: 5898
biglm objects can be updated to include more data, but not less. So you could do this in the opposite order, starting with less data and adding more. See http://cran.r-project.org/web/packages/biglm/biglm.pdf
However, I suspect you're interested in parameters estimated for subpopulations (ie if rows 1:50 correspond to level "a"
of factor variable factrvar
. In this case, you should use interaction in your formula (~factrvar*x1
) rather than subsetting to data[1:50,]
. Interaction of this type will give different effect estimates for each level of factrvar
. This is more efficient than estimating each parameter separately and will constrain any additional parameters (ie, x2
in ~factrvar*x1 + x2
) to be the same across values of factrvar
--if you estimated the same model multiple times to different subsets, x2
would receive a separate parameter estimate each time.
Upvotes: 1