Can I visualise models by plotting them with the original data on the same graph in R

Question

I am new to R, and only have a basic understanding of statistics. I am learning how to use factorial experiments, and how to fit models to results from Design and Analysis of Experiments (Montgomery 2013, ISBN: 9781118097939).

I used example 5.3 from the textbook. The data can be seen in the code below.

bottel_vul_data <- data.frame(A = rep(c(10, 12, 14), each = 2),
                              B = rep(c(25, 30), each = 12),
                              C = rep(c(200, 250), each = 6),        
                              vul = c(-3, -1, 0, 1, 5, 4,
                                      -1,  0, 2, 1, 7, 6,   
                                      -1,  0, 2, 3, 7, 9,
                                       1,  1, 6, 5, 10, 11))
# bottel_vul_data

and completed the ANOVA analysis

bottel_anova <- aov(vul ~ factor(A) * B * C, data = bottel_vul_data)
summary(bottel_anova)

which yielded the same results as the textbook

              Df Sum Sq Mean Sq F value   Pr(>F)    
factor(A)      2 252.75  126.38 178.412 1.19e-09 ***
B              1  45.37   45.37  64.059 3.74e-06 ***
C              1  22.04   22.04  31.118  0.00012 ***
factor(A):B    2   5.25    2.63   3.706  0.05581 .  
factor(A):C    2   0.58    0.29   0.412  0.67149    
B:C            1   1.04    1.04   1.471  0.24859    
factor(A):B:C  2   1.08    0.54   0.765  0.48687    
Residuals     12   8.50    0.71                     
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

I then plotted the data to visualise it and compared four linear regression models:

only considering factors A, B and C;
considering factors A, B and C as well as the interactions A:B, A:C and B:C;
considering all factors and interactions;
and finally considering factors A, B, C and the interaction A:B.

The fourth model is supposed to be adequate because it includes all factors and interactions that shows a statistical significant difference. Finally, I completed an ANOVA on the four models.

data <- bottel_vul_data[c('vul', 'A', 'B', 'C')]
plot(data)

par(mfrow = c(2,2))
boxplot(vul~A, data = data, main = '% karbonering')
boxplot(vul~B, data = data, main = 'bedryfsdruk')
boxplot(vul~C, data = data, main = 'lynsnelheid')
boxplot(vul~A*B*C, data = data, main = 'A B C')

par(mfrow = c(2,2))
interaction.plot(data$A,data$B, data$vul)
interaction.plot(data$A,data$C, data$vul)
interaction.plot(data$B,data$C, data$vul)
par(mfrow = c(1,1))

model1 = lm(vul~., data = data)
summary(model1)

model2 = lm(vul~.^2, data = data)
summary(model2)

model3 = lm(vul~.^3, data = data)
summary(model3)

model4 = lm(vul ~ A + B + C + A*B, data = data)
summary(model4)

anova(model1, model2, model3, model4)

The output of the ANOVA is shown below

A anova: 4 × 6
Res.Df  RSS Df  Sum of Sq   F   Pr(>F)
               
1   20  21.14583    NA  NA  NA  NA
2   17  14.47917    3   6.666667    2.46628131  0.09958762
3   16  14.41667    1   0.062500    0.06936416  0.79562649
4   19  16.08333    -3  -1.666667   0.61657033  0.61423917

Which suggests that there is no statistical significant difference between Models 3 and 4. Therefore, Model 4 should be the simplest model that describes the data adequately.

I would like to know if there is a way to visualise the fitted models by plotting them along with the real data on a graph in R? I would also like to know if there is a way to check that the model is actually representative of the data? Finally, I want to fit second order models as well so that I can get surface responses, but I have no idea how to do this in R. Is there a function like lm() that I can use?

Can I visualise models by plotting them with the original data on the same graph in R

Answers (1)

Related Questions