may.the.bee
may.the.bee

Reputation: 17

How to specify custom interactions (one binary variable with all other variables) for a caret model?

I am running a glmnet model in R using the caret package and doing repeated nested cross-validation with the nestedcv package. I need to include a "custom" interaction of a single one of my categorical features (which has values 0/1) with all other features (some of which are numerical, others are categorical and coded as 0/1).

See an example below, without interactions:

# Load packages
library(caret)
library(nestedcv)

# Check out data
head(mtcars)

# Select features:
features <- mtcars %>%
  select(cyl, disp, vs, am) %>%
  data.matrix()

# Define outcome column:
outcome <- mtcars %>%
  select(mpg) %>%
  data.matrix()

# Set model parameters:
myControl <- trainControl(
  method = "repeatedcv",
  number = 5,
  repeats = 5)     

# Define tuning grid:
myGrid <- expand.grid(alpha = seq(0.1, 0.9, length = 10),
                      lambda = seq(0.1, 0.9, length = 10))
  
# Tuning both alpha and lambda:
set.seed(123, "L'Ecuyer-CMRG") # for reproducibility
model_ncv <- nestcv.train(
  x = features,
  y = outcome[, 1],
  method = "glmnet",
  outer_method = "cv",
  n_outer_folds = 5,
  trControl = myControl,
  tuneGrid = myGrid,
  metric = "RMSE"
)

Now say I want to run the same model with an added interaction of the variable "vs" with all other variables but no other interactions. How do I do that?

I am aware that in the standard caret train() function you can specify interactions using "formula" command (e.g., train(mpg ~ (cyl + disp + am)*vs, data = mpg, method = "glmnet", myControl = myControl)) but nestcv.train() requires x and y to be specified separately.

I assume I can create a new variable in my dataset that represents the interaction but I am not sure how to go about this in R. For example this tutorial shows it is possible to simply multiply the variables that are interacting but the example in it is with numeric/continuous variables only. Or is it ok to just multiply everything by the 0s/1s that represent each category?

I believe the model.matrix() function might help me here but because I don't know anything about design matrices I am afraid I would do it incorrectly.

Any help will be greatly appreciated.

Upvotes: 0

Views: 65

Answers (0)

Related Questions