Reputation: 11
I have a large dataset of medical insurance claims on which I want to apply GLM regression. I have 4 categorical predictor variables specifically Gender, Age groups, Nationality, and Room Type (VIP, normal etc).
My basic GLM model will include the intercept term and these 4 variables. I now want to introduce two-way interactions but I am not certain about which interactions are significant for the model and which are not. For this purpose, I want to run all possible combinations of the interactions along with the 4 base predictors and then compare all the model results based on a certain characteristic such as AIC or BIC or R-square.
I want to know if there is a function or an easy way in R to run all the possible interactions and save their AIC/BIC/R-square without having to write down the glm function for each possible model.
A few examples of the models to run would be:
1. intercept + Gender + Age + Nationality + RoomType
2. intercept + Gender + Age + Nationality + RoomType + gender*age
3. intercept + Gender + Age + Nationality + RoomType + gender*nationality
4. intercept + Gender + Age + Nationality + RoomType + gender*roomtype
5. intercept + Gender + Age + Nationality + RoomType + age*nationality
6. intercept + Gender + Age + Nationality + RoomType + age*roomtype
7. intercept + Gender + Age + Nationality + RoomType + nationality*roomtype
8. intercept + Gender + Age + Nationality + RoomType + gender*age + gender*nationality
and so on.
Upvotes: 1
Views: 3399
Reputation: 2250
Let's first generate some combinations of variable names.
vars <- c("Gender", "Age", "Nationality", "RoomType")
comb.vars <- expand.grid(vars, vars, stringsAsFactors = FALSE)
comb.vars <- comb.vars[!(comb.vars[,1] == comb.vars[,2]),]
i.vars <- apply(comb.vars, 1, paste, collapse = "*")
Then, let's combine the interactions into batches of exhaustive combinations (inspiration here).
combs.vars <- list(i.vars)
k <- length(i.vars) - 1
while(k > 1){
combs <- t(combn(i.vars, k))
combs.vars <- c(combs.vars, split(combs, seq(nrow(combs))))
k <- k - 1
}
Last, let's create formulas out of the combinations and run GLM on them.
res <- NULL
for(i in 1:length(combs.vars)){
f <- formula(paste("response ~ Gender + Age + Nationality + RoomType +",
paste(combs.vars[[i]], collapse = "+")))
fit <- glm(f, data = input.data)
res <- c(res, fit$call, AIC(fit))
}
res <- data.frame(matrix(res, ncol = 2, byrow = TRUE))
Note, that response
and input.data
need to be replaced with your real names of the respective response variable name and the data.frame
with data.
Upvotes: 2