Hani Abbas
Hani Abbas

Reputation: 11

Run all possible interactions in GLM regression using R

I have a large dataset of medical insurance claims on which I want to apply GLM regression. I have 4 categorical predictor variables specifically Gender, Age groups, Nationality, and Room Type (VIP, normal etc).

My basic GLM model will include the intercept term and these 4 variables. I now want to introduce two-way interactions but I am not certain about which interactions are significant for the model and which are not. For this purpose, I want to run all possible combinations of the interactions along with the 4 base predictors and then compare all the model results based on a certain characteristic such as AIC or BIC or R-square.

I want to know if there is a function or an easy way in R to run all the possible interactions and save their AIC/BIC/R-square without having to write down the glm function for each possible model.

A few examples of the models to run would be:

 1. intercept + Gender + Age + Nationality + RoomType
 2. intercept + Gender + Age + Nationality + RoomType + gender*age
 3. intercept + Gender + Age + Nationality + RoomType + gender*nationality
 4. intercept + Gender + Age + Nationality + RoomType + gender*roomtype
 5. intercept + Gender + Age + Nationality + RoomType + age*nationality
 6. intercept + Gender + Age + Nationality + RoomType + age*roomtype
 7. intercept + Gender + Age + Nationality + RoomType + nationality*roomtype
 8. intercept + Gender + Age + Nationality + RoomType + gender*age + gender*nationality

and so on.

Upvotes: 1

Views: 3399

Answers (1)

nya
nya

Reputation: 2250

Let's first generate some combinations of variable names.

vars <- c("Gender", "Age", "Nationality", "RoomType")
comb.vars <- expand.grid(vars, vars, stringsAsFactors = FALSE)
comb.vars <- comb.vars[!(comb.vars[,1] == comb.vars[,2]),]

i.vars <- apply(comb.vars, 1, paste, collapse = "*")

Then, let's combine the interactions into batches of exhaustive combinations (inspiration here).

combs.vars <- list(i.vars)
k <- length(i.vars) - 1
while(k > 1){
 combs <- t(combn(i.vars, k))
 combs.vars <- c(combs.vars, split(combs, seq(nrow(combs))))
 k <- k - 1
}

Last, let's create formulas out of the combinations and run GLM on them.

res <- NULL

for(i in 1:length(combs.vars)){
 f <- formula(paste("response ~ Gender + Age + Nationality + RoomType +", 
                    paste(combs.vars[[i]], collapse = "+")))
 fit <- glm(f, data = input.data)
 res <- c(res, fit$call, AIC(fit))
}

res <- data.frame(matrix(res, ncol = 2, byrow = TRUE))

Note, that response and input.data need to be replaced with your real names of the respective response variable name and the data.frame with data.

Upvotes: 2

Related Questions