Reputation: 55
I have a dataset with 14 binary variables. I've already tested for significant single variables, but I'd like to also check for significant interactions. However, I know that higher level interactions are unlikely to be significant and just muddle the model. Is there anyway to run a linear model in R, but tell it to only test for interaction between a maximum of 3 variables?
Upvotes: 3
Views: 482
Reputation: 13591
A manual approach
Use combn
to make a triplet combinations of features
Comb <- combn(names(iris)[1:4],3)
Output
[,1] [,2] [,3] [,4]
[1,] "Sepal.Length" "Sepal.Length" "Sepal.Length" "Sepal.Width"
[2,] "Sepal.Width" "Sepal.Width" "Petal.Length" "Petal.Length"
[3,] "Petal.Length" "Petal.Width" "Petal.Width" "Petal.Width"
Then use as.formula
to manually define formula using combinations of 3 features
ans <- apply(Comb, 2, function(x) glm(as.formula(paste0("Species ~ ", paste0(x, collapse=" + "))), data=iris, family=binomial()))
ans
Output
[[1]]
Call: glm(formula = as.formula(paste0("Species ~ ", paste0(x, collapse = " + "))),
family = binomial(), data = iris)
Coefficients:
(Intercept) Sepal.Length Sepal.Width Petal.Length
71.80 -23.91 -13.51 34.95
Degrees of Freedom: 149 Total (i.e. Null); 146 Residual
Null Deviance: 191
Residual Deviance: 3.523e-09 AIC: 8
[[2]]
Call: glm(formula = as.formula(paste0("Species ~ ", paste0(x, collapse = " + "))),
family = binomial(), data = iris)
Coefficients:
(Intercept) Sepal.Length Sepal.Width Petal.Width
-25.477 6.762 -19.057 59.292
Degrees of Freedom: 149 Total (i.e. Null); 146 Residual
Null Deviance: 191
Residual Deviance: 4.144e-09 AIC: 8
# etc
Upvotes: 1
Reputation: 270248
Using the first 5 columns of the built-in anscombe data set:
lm(y1 ~ .^3, anscombe[1:5])
giving:
Call:
lm(formula = y1 ~ .^3, data = anscombe[1:5])
Coefficients:
(Intercept) x1 x2 x3 x4 x1:x2
12.81992 -2.60371 NA NA -0.16258 0.36279
x1:x3 x1:x4 x2:x3 x2:x4 x3:x4 x1:x2:x3
NA NA NA NA NA -0.01345
x1:x2:x4 x1:x3:x4 x2:x3:x4
NA NA NA
Upvotes: 5