Pritam Paramanick
Pritam Paramanick

Reputation: 11

Regression model with all possible two way interaction terms in r

I have a data set with 8 variables. I need all possible two way interaction terms along with the seven predictors in each model. So, in my case there will be total 7C2 = 21 models, each of them containing the 7 predictors and a two way interaction term at a time.

I have tried to produce the 21 models using for loop but the code seems to fail at the lm() function when I try to use that inside the for loop. In my problem return is the response variable at the 5-th column of my data.

colnames(dt) = c("assets","turnover_ratio","SD","sharpe_ratio","return",
                 "expense_ratio","fund_dummy","risk_dummy")
vars=colnames(dt)[-5] 
for (i in vars)  {
  for (j in vars) {
    if (i != j) {
      factor= paste(i,j,sep='*')}
    lm.fit <- lm(paste("return ~", factor), data=dt)
    print(summary(lm.fit))
  }}

The error message is given below for the code:

Error in paste("return ~", factor) : cannot coerce type 'closure' to vector of type 'character'

This is my data set: data set

The output below should be the desired output and 20 more such models are needed with other possible two way interaction terms. All the 7 predictors should be present in each model. The only thing that should change is the two way interaction term.

This is my desired output among the 21 required: one desired output among the 21 required outputs

Upvotes: 1

Views: 2945

Answers (3)

Rui Barradas
Rui Barradas

Reputation: 76450

The following apply loop gets all pairwise interactions between the 7 variables. The 21 pairs are first obtained with combn.

vars <- colnames(dt)[-5] 
resp <- colnames(dt)[5] 

cmb <- combn(vars, 2)

lm_list <- apply(cmb, 2, function(regrs){
  inter_regrs <- paste(regrs, collapse = "*")
  other_regrs <- setdiff(vars, regrs)
  all_regrs <- paste(other_regrs, collapse = "+")
  all_regrs <- paste(all_regrs, inter_regrs, sep = "+")
  fmla <- as.formula(paste(resp, all_regrs, sep = "~"))
  lm(fmla, data = dt)
})

lapply(lm_list, summary)

Data creation code.

set.seed(1234)
dt <- replicate(8, rnorm(100))
dt <- as.data.frame(dt)

colnames(dt) <- c("assets","turnover_ratio","SD",
              "sharpe_ratio","return","expense_ratio",
              "fund_dummy","risk_dummy")

Upvotes: 2

mptrossbach
mptrossbach

Reputation: 384

I think this should work and allow you to get rid of the loops:

lm.fit = lm(return ~ (.)^2, data=dt)

Upvotes: 1

Santiago I. Hurtado
Santiago I. Hurtado

Reputation: 1123

Your problem is the end of the if statement. This code should work:

colnames(dt) = c("assets","turnover_ratio","SD","sharpe_ratio","return",
                 "expense_ratio","fund_dummy","risk_dummy")
vars=colnames(dt)[-5] 
for (i in vars)  {
  for (j in vars) {
    if (i != j) {
      factor= paste(i,j,sep='*')
      lm.fit <- lm(paste0("return ~", factor), data=dt)
      print(summary(lm.fit))
    }
  }
}

The problem was that for the first iteration the variable factor was not define. Also try not to name a variable factor, since factor is a function in R.

Upvotes: 1

Related Questions