Model selection in R, do I include interactions between variables?

Question

So I have looked around, and I can't seem to work this out from what I've found.

I'm trying to calculate BIC for three models I have,

resistivity1 = rho_i*(1 + (3/8)*lam*(1/thickness))

resistivity2 = rho_i(1 + (3/2)*lam*(1/grains)*(R/(1-R)))

resistivity3 = rho_i*(1 + (3/8)*lam*(1/thickness) + (3/2)*lam*(1/grains)*(R/(1-R)))

R, lam, and rho_i are some constants which for now I have missed out- I have particular values for these.

This is what I've written in R,

BIC(lm(formula = resistivity ~ 1 + (3/8)*I(1/thickness), data=z))

BIC(lm(formula = resistivity ~ 1 + (3/2)*I(1/grains), data=z)) 

BIC(lm(formula = resistivity ~ I(1 + (3/8)*I(1/thickness) + (3/2)*I(1/grains)), data=z))

If anyone needs to know, this is what the head of my data looks like,

|   | thickness | grains | resistivity |
---------------------------------------|
| 1 |     524.4 |   1829 |        15.6 | 
| 2 |     670.5 |   3155 |    450000.0 |
| 3 |     943.4 |   3859 |        22.1 |
| 4 |    1072.3 |   4585 |        10.9 |

Basically, I don't know if what I've written in R is the same as the models defined above. Is it necessary to include the interactions resistivity:thickness, resistivity:grains as well?

Thanks in advance.

setempler · Accepted Answer

You can run an anova on both models, one with, and one without interaction. It should show you if adding the interaction better explains your data. In case not, leave it out.

Since you did not include a reproducible example, take this as an guide:

anova(lm(y~a+b), lm(y~a+b+a:b))

Tells you if adding interaction a:b improves the model.

On a dataset (model without sense):

data(french_fries)
attach(french_fries)
anova(lm(potato ~ time + treatment), # model 1 with no interaction
      lm(potato ~ time * treatment)) # model 2 with interaction

The output tells that using the interaction improves the model (lower RSS), but not significantly:

Analysis of Variance Table

Model 1: potato ~ time + treatment
Model 2: potato ~ time * treatment
  Res.Df    RSS Df Sum of Sq      F Pr(>F)
1    683 8128.6                           
2    665 8012.6 18    115.93 0.5345 0.9422

Model selection in R, do I include interactions between variables?

Answers (1)

Related Questions