Reputation: 141
I'm an absolute R beginner and need some help with my likelihood ratio tests for my univariate analyses. Here's the code:
#Univariate analysis for conscientiousness (categorical)
fit <- glm(BCS_Bin~Conscientiousness_cat,data=dat,family=binomial)
summary(fit)
#Likelihood ratio test
fit0<-glm(BCS_Bin~1, data=dat, family=binomial)
summary(fit0)
lrtest(fit, fit0)
The results are:
Call:
glm(formula = BCS_Bin ~ Conscientiousness_cat, family = binomial,
data = dat)
Deviance Residuals:
Min 1Q Median 3Q Max
-0.8847 -0.8847 -0.8439 1.5016 1.5527
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.84933 0.03461 -24.541 <2e-16 ***
Conscientiousness_catLow 0.11321 0.05526 2.049 0.0405 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 7962.1 on 6439 degrees of freedom
Residual deviance: 7957.9 on 6438 degrees of freedom
(1963 observations deleted due to missingness)
AIC: 7961.9
Number of Fisher Scoring iterations: 4
And:
Call:
glm(formula = BCS_Bin ~ 1, family = binomial, data = dat)
Deviance Residuals:
Min 1Q Median 3Q Max
-0.8524 -0.8524 -0.8524 1.5419 1.5419
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.82535 0.02379 -34.69 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 10251 on 8337 degrees of freedom
Residual deviance: 10251 on 8337 degrees of freedom
(65 observations deleted due to missingness)
AIC: 10253
Number of Fisher Scoring iterations: 4
For my LRT:
Error in lrtest.default(fit, fit0) :
models were not all fitted to the same size of dataset
I understand that this is happening because there's different numbers of observations missing? That's because it is data from a large questionnaire, and many more drop outs had occurred by the question assessing my predictor variable (conscientiousness) when compared with the outcome variable (body condition score/BCS). So I just have more data for BCS than conscientiousness, for example (it's producing the same error for many of my other variables too).
Upvotes: 3
Views: 3154
Reputation: 93851
In order to run the likelihood ratio test, the model with just the intercept has to be fit to the same observations as the model that includes Conscientiousness_cat
. So, you need the subset of the data that has no missing values for Conscientiousness_cat
:
BCS_bin_subset = BCS_bin[complete.cases(BCS_bin[,"Conscientiousness_cat"]), ]
You can run both models on this subset of the data and the likelihood ratio test should run without error.
In your case, you could also do:
BCS_bin_subset = BCS_bin[!is.na(BCS_bin$Conscientiousness_cat), ]
However, it's nice to have complete.cases
handy when you want a subset of a data frame with no missing values across multiple variables.
Another option that is more convenient if you're going to run multiple models, but that is also more complex is to first fit whatever model uses the largest number of variables from BCS_bin
(since that model will exclude the largest number of observations due to missingness) and then use the update
function to update that model to models with fewer variables. We just need to make sure that update
uses the same observations each time, which we do using a wrapper function defined below. Here's an example using the built-in mtcars
data frame:
library(lmtest)
dat = mtcars
# Create some missing values in mtcars
dat[1, "wt"] = NA
dat[5, "cyl"] = NA
dat[7, "hp"] = NA
# Wrapper function to ensure the same observations are used for each
# updated model as were used in the first model
# From https://stackoverflow.com/a/37341927/496488
update_nested <- function(object, formula., ..., evaluate = TRUE){
update(object = object, formula. = formula., data = object$model, ..., evaluate = evaluate)
}
m1 = lm(mpg ~ wt + cyl + hp, data=dat)
m2 = update_nested(m1, . ~ . - wt) # Remove wt
m3 = update_nested(m1, . ~ . - cyl) # Remove cyl
m4 = update_nested(m1, . ~ . - wt - cyl) # Remove wt and cyl
m5 = update_nested(m1, . ~ . - wt - cyl - hp) # Remove all three variables (i.e., model with intercept only)
lrtest(m5,m4,m3,m2,m1)
Upvotes: 1