Reputation: 73
I try to get the mean of the product (interaction) of two variables from a model fitted with lm()
.
N <- 1000
u <- rnorm(N)
x1 <- rnorm(N)
x2 <- 1 + x1 + rnorm(N)
y <- 1 + x1 + x2 + u
df <- data.frame(y,x1,x2)
fit <- lm(y ~ x1 * x2, data = df)
I can calculate the mean of a single variable for a coefficient accessing $model
.
mean(fit$model[,2])
# verify result
mean(df[,2])
But how can I get the mean of the interaction without going back into the data.
# Result should be
mean(df$x1*df$x2)
Upvotes: 0
Views: 1049
Reputation: 174813
I'm not sure why you want this, but it is trivial to get from fit
. First, it is best not to delve into fitted objects like this with $
. Instead learn to use extractor functions. In this case, the equivalent of mean(fit$model[,2])
would be, for all columns of the data at once:
> colMeans(model.frame(fit))
y x1 x2
2.0783225 0.0283555 1.0481141
The model frame is just a copy of the data. What you want is the design matrix, or as R calls it the model matrix, which, unsurprisingly is obtained using the model.matrix()
function.
> head(model.matrix(fit))
(Intercept) x1 x2 x1:x2
1 1 -0.33406119 1.95054087 -0.65160001
2 1 -1.41848058 0.35429591 -0.50256186
3 1 -1.32877702 -0.00783884 0.01041607
4 1 0.54054637 1.34637056 0.72777572
5 1 -0.75686319 -0.36476471 0.27607699
6 1 0.04514449 1.62928315 0.07355316
Note that the response data aren't in the design matrix, but the interaction term is, in the final column. Use colMeans()
again to get the mean of each column of this design matrix:
> colMeans(model.matrix(fit))
(Intercept) x1 x2 x1:x2
1.0000000 0.0283555 1.0481141 1.0820110
For completeness I should show this is correct for my random data set:
> colMeans(transform(df[,-1], interaction = x1 * x2))
x1 x2 interaction
0.0283555 1.0481141 1.0820110
Upvotes: 3
Reputation: 132706
mean(x1 * x2)
#[1] 0.9009494
mean(do.call("*", fit$model[, c("x1", "x2")]))
#[1] 0.9009494
fit <- lm(y ~ x1 * x2, data = df, x=TRUE)
mean(fit$x[,"x1:x2"])
#[1] 0.9009494
Upvotes: 0