MJW
MJW

Reputation: 141

Likelihood ratio test: 'models were not all fitted to the same size of dataset'

I'm an absolute R beginner and need some help with my likelihood ratio tests for my univariate analyses. Here's the code:

#Univariate analysis for conscientiousness (categorical)
fit <- glm(BCS_Bin~Conscientiousness_cat,data=dat,family=binomial)
summary(fit)

#Likelihood ratio test
fit0<-glm(BCS_Bin~1, data=dat, family=binomial)
summary(fit0)
lrtest(fit, fit0)

The results are:

Call:
glm(formula = BCS_Bin ~ Conscientiousness_cat, family = binomial, 
data = dat)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-0.8847  -0.8847  -0.8439   1.5016   1.5527  

Coefficients:
                         Estimate Std. Error z value Pr(>|z|)    
(Intercept)              -0.84933    0.03461 -24.541   <2e-16 ***
Conscientiousness_catLow  0.11321    0.05526   2.049   0.0405 *  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 7962.1  on 6439  degrees of freedom
Residual deviance: 7957.9  on 6438  degrees of freedom
(1963 observations deleted due to missingness)
AIC: 7961.9

Number of Fisher Scoring iterations: 4

And:

Call:
glm(formula = BCS_Bin ~ 1, family = binomial, data = dat)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-0.8524  -0.8524  -0.8524   1.5419   1.5419  

Coefficients:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept) -0.82535    0.02379  -34.69   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 10251  on 8337  degrees of freedom
Residual deviance: 10251  on 8337  degrees of freedom
(65 observations deleted due to missingness)
AIC: 10253

Number of Fisher Scoring iterations: 4

For my LRT:

Error in lrtest.default(fit, fit0) : 
models were not all fitted to the same size of dataset

I understand that this is happening because there's different numbers of observations missing? That's because it is data from a large questionnaire, and many more drop outs had occurred by the question assessing my predictor variable (conscientiousness) when compared with the outcome variable (body condition score/BCS). So I just have more data for BCS than conscientiousness, for example (it's producing the same error for many of my other variables too).

Upvotes: 3

Views: 3154

Answers (1)

eipi10
eipi10

Reputation: 93851

In order to run the likelihood ratio test, the model with just the intercept has to be fit to the same observations as the model that includes Conscientiousness_cat. So, you need the subset of the data that has no missing values for Conscientiousness_cat:

BCS_bin_subset = BCS_bin[complete.cases(BCS_bin[,"Conscientiousness_cat"]), ]

You can run both models on this subset of the data and the likelihood ratio test should run without error.

In your case, you could also do:

BCS_bin_subset = BCS_bin[!is.na(BCS_bin$Conscientiousness_cat), ]

However, it's nice to have complete.cases handy when you want a subset of a data frame with no missing values across multiple variables.

Another option that is more convenient if you're going to run multiple models, but that is also more complex is to first fit whatever model uses the largest number of variables from BCS_bin (since that model will exclude the largest number of observations due to missingness) and then use the update function to update that model to models with fewer variables. We just need to make sure that update uses the same observations each time, which we do using a wrapper function defined below. Here's an example using the built-in mtcars data frame:

library(lmtest)

dat = mtcars

# Create some missing values in mtcars
dat[1, "wt"] = NA
dat[5, "cyl"] = NA
dat[7, "hp"] = NA

# Wrapper function to ensure the same observations are used for each 
#  updated model as were used in the first model
# From https://stackoverflow.com/a/37341927/496488
update_nested <- function(object, formula., ..., evaluate = TRUE){
  update(object = object, formula. = formula., data = object$model, ..., evaluate = evaluate)
}

m1 = lm(mpg ~ wt + cyl + hp, data=dat)
m2 = update_nested(m1, . ~ . - wt)  # Remove wt
m3 = update_nested(m1, . ~ . - cyl) # Remove cyl
m4 = update_nested(m1, . ~ . - wt - cyl) # Remove wt and cyl
m5 = update_nested(m1, . ~ . - wt - cyl - hp) # Remove all three variables (i.e., model with intercept only)

lrtest(m5,m4,m3,m2,m1)

Upvotes: 1

Related Questions