Reputation: 91
This is my data frame (please copy and paste to reproduce):
Control <- replicate(2, c("112", "113", "116", "118", "127", "131", "134", "135", "136", "138", "143", "148", "149", "152", "153", "155", "162", "163"))
EPD <- replicate(2, c("101", "102", "103", "104", "105", "106", "107", "108", "109", "110", "114", "115", "117", "119", "120", "122", "124", "125", "126", "128", "130", "133", "137", "139", "140", "141", "142", "144", "145", "147"))
Subject <- c(Control, EPD)
Control_FA_L <- c(0.43, 0.39, 0.38, 0.58, 0.37, 0.5, 0.35, 0.36, 0.72, 0.38, 0.45, 0.30, 0.47, 0.30, 0.67, 0.34, 0.42, 0.29)
Control_FA_R <- c(0.36, 0.49, 0.55, 0.59, 0.33, 0.41, 0.32, 0.50, 0.59, 0.52, 0.32, 0.40, 0.49, 0.33, 0.46, 0.39, 0.37, 0.33)
EPD_FA_L <- c(0.25, 0.39, 0.36, 0.42, 0.21, 0.40, 0.43, 0.16, 0.31, 0.41, 0.39, 0.40, 0.35, 0.29, 0.31, 0.24, 0.39, 0.36, 0.54, 0.38, 0.34, 0.28, 0.42, 0.33, 0.40, 0.36, 0.42, 0.28, 0.40, 0.41)
EPD_FA_R <- c(0.26, 0.36, 0.36, 0.61, 0.22, 0.33, 0.36, 0.34, 0.35, 0.37, 0.39, 0.45, 0.30, 0.31, 0.50, 0.31, 0.29, 0.43, 0.41, 0.21, 0.38, 0.28, 0.66, 0.33, 0.50, 0.27, 0.46, 0.37, 0.26, 0.39)
FA <- c(Control_FA_L, Control_FA_R, EPD_FA_L, EPD_FA_R)
Control_Volume_L <- c(99, 119, 119, 146, 127, 96, 100, 132, 103, 103, 107, 142, 140, 134, 117, 117, 133, 143)
Control_Volume_R <- c(93, 123, 114, 152, 122, 105, 98, 138, 111, 110, 115, 137, 142, 140, 124, 102, 153, 143)
EPD_Volume_L <- c(132, 115, 140, 102, 130, 131, 110, 124, 102, 111, 93, 92, 94, 104, 92, 115, 144, 118, 104, 132, 90, 102, 94, 112, 106, 105, 79, 114, 104, 108)
EPD_Volume_R <- c(136, 116, 143, 105, 136, 137, 103, 121, 105, 115, 97, 97, 93, 108, 91, 117, 147, 111, 97, 129, 85, 107, 91, 116, 113, 101, 75, 108, 95, 98)
Volume <- c(Control_Volume_L, Control_Volume_R, EPD_Volume_L, EPD_Volume_R)
Group <- c(replicate(36, "Control"), replicate(60, "Patient"))
data <- data.frame(Subject, FA, Volume, Group)
I then run a linear mixed model for FA values with the nlme package:
library(nlme)
lmm <- lme(FA ~ Volume + Group, ~ 1|Subject, data = data)
summary(lmm)
I would now like to determine the 95% Confidence Interval for the model estimated difference in FA between the two levels of the "Group" factor (Control & Patient). I would normally proceed by executing the following code:
# Compute 95% Confidence Interval for Group factor
# True difference in STN FA between Control and EPD subjects
0.0857851 # Value from mixed model
# Multiply 97.5 percentile point of normal distribution by std error from mixed model
1.96 * 0.02555076 # 95% CI: 0.086 ± 0.050 mm^3 (p = .0016) - !!CI includes values > 1!!
I am having a hard time interpreting what this means. The confidence interval I have calculated includes values greater than 1 which doesn't make sense since FA is supposed to be a ratio value from 0 to 1. Is the fact that my dependent variable is a ratio value the issue? If so would I need to transform my data in some way (i.e. log transform) to correct this? Any feedback would be greatly appreciated!
Upvotes: 0
Views: 1269
Reputation: 50718
As pointed out by @42-, the issue here is with the model itself. Since FD
is restricted to [0, 1] we can't use lme
which assumes normal errors.
I don't know any details about your data/experiment but perhaps a Beta model may work; specifically we could use a varying-intercept hierarchical model of the form
where we connect the mean of FD
μ to a Subject
-specific intercept and the predictors Volume
and Group
through a logit-link.
The library glmmTMB
allows to implement such a mixed-effect model
library(glmmTMB)
lmm <- glmmTMB(
FA ~ Volume + Group + (1 | Subject),
data = data,
family = "beta_family")
summary(lmm)$coef$cond
# Estimate Std. Error z value Pr(>|z|)
#(Intercept) 0.502858259 0.348506927 1.442893 0.1490505719
#Volume -0.006464251 0.002782781 -2.322947 0.0201820253
#GroupPatient -0.369273205 0.104832100 -3.522520 0.0004274642
Some comments on the estimates
Note that estimates are given on the logit (log odds) scale; the estimate for Group = Control
is then 0.503 - 0.369 * 0 = 0.503
, and for Group = Patient
it is 0.503 - 0.369 * 1 = 0.134
. The difference between Group = Patient
and Group = Control
(again on the logit scale) is simply the coefficient for GroupPatient
which is -0.369
.
I then recommend using emmeans
for any follow-up analyses; in this case we can use emmeans::pairs
to compare the estimated marginal means (EMMs) for the two Group
levels
library(emmeans)
confint(pairs(emmeans(lmm, "Group")))
# contrast estimate SE df lower.CL upper.CL
# Control - Patient 0.3692732 0.1048321 91 0.1610371 0.5775093
#
#Results are given on the log odds ratio (not the response) scale.
#Confidence level used: 0.95
Note that results are given on the logit scale (and not on the response scale). To get a ratio of FD
responses for Group = Patient
and Group = Control
you need to manually convert these estimates.
Explanation: Here emmeans
returns the EMMs for Group
, and pairs
performs a pairwise comparison of the different levels of Group
. We then use confint
to return (default 95%) confidence intervals.
The nice thing is that you wouldn't have to change anything if you have >2 levels for Group
; pairs
would perform pairwise comparisons, and automatically correct p-values for multiple hypothesis testing.
For more reading, take a look at the excellent vignette Comparisons and contrasts in emmeans.
You can also get estimated marginal means and confidence intervals on the odds ratio scale (which avoids having to manually transform from the logit scale to the odds ratio scale)
confint(pairs(emmeans(lmm, "Group"), type = "response"))
# contrast odds.ratio SE df lower.CL upper.CL
# Control / Patient 1.446683 0.1516588 91 1.174729 1.781595
Confidence level used: 0.95
Intervals are back-transformed from the log odds ratio scale
Upvotes: 0