Omar123456789
Omar123456789

Reputation: 61

ANOVA function when comparing logistic models has no p-value for Deviance

I'm working with the biopsy dataset from the MASS library in R. I'm at the initial stages of creating a logistic regression model to see what variables have an impact in the probability of having a malignant tumor. I removed all rows with missing data (around 16 observations). All variables are significant on their own, so I started with the fullest model with all variables included and the third variable (V3 - Uniformity of cell size) was the least significant in this fullest model possible.

I created another model with V3 removed. I then wanted to use the anova() function to see if there is a significant difference in the fits of the two models. However, I get no p-value from my anova test. Does this mean the p-value is nearly is 1? Have I made an error somewhere in my model set up?

All input is appreciated!

#post removal of rows with missing data from biopsy in library(MASS)     
relevel(biopsy$class, ref = "malignant")
#assigns value of interst to malignant instead of benign. 
fullest.model = glm(biopsy$class~biopsy[,2]+biopsy[,3]+biopsy[,4]+biopsy[,5]+
                  biopsy[,6]+biopsy[,7]+biopsy[,8]+biopsy[,9]+biopsy[,10]
                ,family = binomial(link = "logit"))
model1 = glm(biopsy$class~biopsy[,2]+biopsy[,4]+biopsy[,5]+
           biopsy[,6]+biopsy[,7]+biopsy[,8]+biopsy[,9]+biopsy[,10]
         ,family = binomial(link = "logit"))
anova(model1, fullest.model)

Output I get:

      Resid. Df Resid. Dev Df   Deviance
1       674     102.89              
2       673     102.89  1 0.00090001

^See no pvalue!!

Upvotes: 2

Views: 13585

Answers (1)

Maurits Evers
Maurits Evers

Reputation: 50668

  1. We generate some sample data, assuming the GLM y = 0.5 * x1 + 4 * x2.

    # Generate some sample data
    x1 <- 1:100;
    x2 <- gl(2, 50, 100);
    set.seed(2017);
    y <- 0.5 * x1 + 4 * as.numeric(x2) + rnorm(100);
    
  2. We now fit two models:

    • fit1 estimates coefficients for model y = beta0 + beta1 * x1,
    • fit2 estimates coefficients for model y = beta0 + beta1 * x1 + beta2 * x2.

    # Fit two models
    fit1 <- glm(y ~ x1 + x2);
    fit2 <- glm(y ~ x1);
    
  3. Perform ANOVA analyses.

    # Default ANOVA (note this does not perform any hypothesis test)
    anova(fit1, fit2);
    #Analysis of Deviance Table
    #
    #Model 1: y ~ x1 + x2
    #Model 2: y ~ x1
    #  Resid. Df Resid. Dev Df Deviance
    #1        97     112.11
    #2        98     213.39 -1  -101.28
    
    # ANOVA with likelihood ratio test
    anova(fit1, fit2, test = "Chisq");
    #Analysis of Deviance Table
    #
    #Model 1: y ~ x1 + x2
    #Model 2: y ~ x1
    #  Resid. Df Resid. Dev Df Deviance  Pr(>Chi)
    #1        97     112.11
    #2        98     213.39 -1  -101.28 < 2.2e-16 ***
    #---
    #Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
    

    Note that the first ANOVA comparison does not perform any hypothesis test. It simply calculates the change in deviance between the two models. The second ANOVA analysis anova(..., test = "Chisq") performs a likelihood ratio test (it is the same as anova(..., test = "LRT")), by calculating the probability for observing a chi-squared distributed test statistic (i.e. the change in deviance) as extreme or more extreme. This latter quantity corresponds to the p-value of your hypothesis test.

  4. Lastly, have a look at this link. It provides more details on how to perform and interpret the output of an ANOVA analysis.

Upvotes: 5

Related Questions