Reputation: 21
I'm new in statistics and this package. I expected that the p value should be the same if my data multiply or divide by the same number, for example, all *10 or all *100.
But since my data is too small(~10^-9), the p value is almost 1 at the beginning. But when I multiply the data (the 'x' in the test), the p value decreased until data became ~10^-5 and then the p value wouldn't change.
test= lmer(x ~ a + b +c + (1|rep), data=data)
Estimate Std. Error df t value Pr(>|t|)
a2 -5.783e-09 1.232e-09 8.879e-05 -4.693 0.999 (raw data)
a2 -5.783e-08 1.232e-08 6.177e-03 -4.693 0.971 (raw data*10)
a2 -5.783e-07 1.232e-07 3.473e-01 -4.693 0.397 (raw data*100)
a2 -5.783e-06 1.232e-06 7.851e+00 -4.693 0.00164 ** (raw data*1000)
a2 -5.783e-05 1.232e-05 9.596e+01 -4.693 8.95e-06 *** (raw data*10000)
a2 -0.0005783 0.0001232 95.9638425 -4.693 8.95e-06 *** (raw data*100000)
I don't understand why these p values change to be constant. Could someone kindly explain it for me?
Upvotes: 2
Views: 151
Reputation: 910
Okay, so after a bit of digging, I think that I have found a solution and an explanation. As you can see in your example, the t-value is not changing. The changes in p-value are due to changes in the estimated degrees of freedom. The default method for this is the Satterthwaite method, which according to this post from one of the authors of the package depends on the dependent variable (see the post here: https://stats.stackexchange.com/questions/342848/satterthwaite-degrees-of-freedom-in-a-mixed-model-change-drastically-depending-o)
Now, within a normal range of orders of magnitude, the degrees of freedom do not change and the p-values remain constant. You approached this from the other direction in your example, noting that the numbers stopped changing after a certain point (when the numbers in the DV were sufficiently large). Here I show that they are stable using an example from the iris package included with R:
# Preparing data
d <- iris
d$width <- d$Sepal.Width
d$Species <- as.factor(d$Species)
# Creating slightly smaller versions of the DV
d$length <- d$Sepal.Length
d$length_10 <- d$Sepal.Length/10
d$length_1e2 <- d$Sepal.Length/1e2
d$length_1e3 <- d$Sepal.Length/1e3
# fitting the models
m1 <- lmer(length ~ width + (1|Species),data = d)
m2 <- lmer(length_10 ~ width + (1|Species),data = d)
m3 <- lmer(length_1e2 ~ width + (1|Species),data = d)
m4 <- lmer(length_1e3 ~ width + (1|Species),data = d)
# The coefficients are all the same
> summary(m1)$coefficients
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 3.4061671 0.6683080 3.405002 5.096703 1.065543e-02
width 0.7971543 0.1062064 146.664820 7.505711 5.453404e-12
> summary(m2)$coefficients
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 0.34061671 0.06683080 3.405002 5.096703 1.065543e-02
width 0.07971543 0.01062064 146.664820 7.505711 5.453404e-12
> summary(m3)$coefficients
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 0.034061671 0.006683079 3.405003 5.096703 1.065542e-02
width 0.007971543 0.001062064 146.664820 7.505711 5.453405e-12
> summary(m4)$coefficients
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 0.0034061671 0.0006683079 3.405003 5.096703 1.065542e-02
width 0.0007971543 0.0001062064 146.664820 7.505711 5.453405e-12
However, your numbers are much smaller than this, so I made much smaller versions of the DV to try and re-create your example. As you can see, the degrees of freedom start approaching zero which causes the p-values to move towards one.
# Much smaller numbers
d$length_1e6 <- d$Sepal.Length/1e6
d$length_1e7 <- d$Sepal.Length/1e7
d$length_1e8 <- d$Sepal.Length/1e8
# fitting the models
m5 <- lmer(length_1e6 ~ width + (1|Species),data = d)
m6 <- lmer(length_1e7 ~ width + (1|Species),data = d)
m7 <- lmer(length_1e8 ~ width + (1|Species),data = d)
# Here we recreate the problem
> summary(m5)$coefficients
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 3.406167e-06 6.683079e-07 0.5618686 5.096703 0.2522273
width 7.971543e-07 1.062064e-07 0.6730683 7.505711 0.1599534
> summary(m6)$coefficients
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 3.406167e-07 6.683080e-08 0.01224581 5.096703 0.9461743
width 7.971543e-08 1.062064e-08 0.01229056 7.505711 0.9415154
> summary(m7)$coefficients
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 3.406167e-08 6.683080e-09 0.0001784636 5.096703 0.9988162
width 7.971543e-09 1.062064e-09 0.0001784738 7.505711 0.9987471
A possible solution to this is to use another approximation method, Kenward-Roger. Let's take the model with the smallest transformation of the DV here. we can do that with the following code:
summary(m7, ddf="Kenward-Roger")$coefficients
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 3.406167e-08 6.687077e-09 3.408815 5.093656 1.064475e-02
width 7.971543e-09 1.064752e-09 146.666335 7.486759 6.053660e-12
As you can see, with this method the numbers from the smallest version of our transformation now match the stable numbers from the larger transformations. Understanding exactly why small numbers are a problem for the Satterthwaite method is beyond my understanding of the methods employed by lmerTest method, but I know at least one of them is on here and might be able to provide additional insight. I suspect it might be related to underflow, as your numbers are very small, but I can't be sure.
I hope this helps!
Upvotes: 1