stat_student
stat_student

Reputation: 827

Prediction Components in R Linear Regression

I was wondering how to get the actual components from predict(..., type = 'term). I know that if I take the rowSums and add the attr(,"constant") value to each, I will get the predicted values but what I'm not sure about is how this attr(,"constant") is split up between the columns. All in all, how do I alter the matrix returned by predict so that each value represents the model coefficient multiplied by the prediction data. The result should be a matrix (or data.frame) with the same dimensions as returned by predict but the rowSums automatically add up to the predicted values with no further alteration needed.

Note: I realize I could probably take the coefficients produced by the model and matrix multiply them with my prediction matrix but I'd rather not do it that way to avoid any problems that factors could produce.

Edit: The goal of this question is not to produce a way of summing the rows to get the predicted values, that was just meant as a sanity check.

If I have the equation y = 2*a + 3*b + c and my predicted value is 500, I want to know what 2*a was, what 3*b was, and what c was at that particular point. Right now I feel like these values are being returned by predict but they've been scaled. I need to know how to un-scale them.

Upvotes: 1

Views: 1468

Answers (2)

Rorschach
Rorschach

Reputation: 32466

It's not split up between the columns - it corresponds to the intercept. If you include an intercept in the model, then it is the mean of the predictions. For example,

## With intercept
fit <- lm(Sepal.Length ~ Sepal.Width + Species, data=iris)
tt <- predict(fit, type="terms")
pp <- predict(fit)
attr(tt, "constant")
# [1] 5.843333
attr(scale(pp, scale=F), "scaled:center")
# [1] 5.843333
## or
mean(pp)
# [1] 5.843333

If you make the model without an intercept, there won't be a constant, so you will have a matrix where the rowSums correspond to the predictions.

## Without intercept
fit1 <- lm(Sepal.Length ~ Sepal.Width + Species - 1, data=iris)
tt1 <- predict(fit1, type="terms")
attr(tt1, "constant")
# [1] 0

all.equal(rowSums(tt1), predict(fit1))
## [1] TRUE

By scaling (subtracting the mean) of the predicted variable, only the intercept is changed, so when there is no intercept no scaling is done.

fit2 <- lm(scale(Sepal.Length, scale=F) ~ Sepal.Width + Species, data=iris)
all.equal(coef(fit2)[-1], coef(fit)[-1])
## [1] TRUE

Upvotes: 1

frank2165
frank2165

Reputation: 114

As far as I know, the constant is set as an attribute to save memory, if you want rowSums to calculate the correct predicted values then you either need to create the extra column containing constant or just add constant to the output of rowSums. (see the unnecessarily verbose example below)

rowSums_lm <- function(A){
   if(!is.matrix(A) || is.null(attr(A, "constant"))){
          stop("Input must be a matrix with a 'constant' attribute")
   }
   rowSums(A) + attr(A, "constant")
}

Upvotes: 0

Related Questions