Filiy
Filiy

Reputation: 21

why lmerTest gives different p value when data is too small?

I'm new in statistics and this package. I expected that the p value should be the same if my data multiply or divide by the same number, for example, all *10 or all *100.

But since my data is too small(~10^-9), the p value is almost 1 at the beginning. But when I multiply the data (the 'x' in the test), the p value decreased until data became ~10^-5 and then the p value wouldn't change.

test= lmer(x ~ a + b +c + (1|rep), data=data)

   Estimate   Std. Error     df    t value   Pr(>|t|)
a2  -5.783e-09  1.232e-09  8.879e-05  -4.693    0.999            (raw data)
a2  -5.783e-08  1.232e-08  6.177e-03  -4.693    0.971            (raw data*10)
a2  -5.783e-07  1.232e-07  3.473e-01  -4.693    0.397            (raw data*100)
a2  -5.783e-06  1.232e-06  7.851e+00  -4.693  0.00164 **         (raw data*1000)
a2  -5.783e-05  1.232e-05  9.596e+01  -4.693 8.95e-06 ***        (raw data*10000)
a2  -0.0005783  0.0001232 95.9638425  -4.693 8.95e-06 ***        (raw data*100000)

I don't understand why these p values change to be constant. Could someone kindly explain it for me?

Upvotes: 2

Views: 151

Answers (1)

sjp
sjp

Reputation: 910

Okay, so after a bit of digging, I think that I have found a solution and an explanation. As you can see in your example, the t-value is not changing. The changes in p-value are due to changes in the estimated degrees of freedom. The default method for this is the Satterthwaite method, which according to this post from one of the authors of the package depends on the dependent variable (see the post here: https://stats.stackexchange.com/questions/342848/satterthwaite-degrees-of-freedom-in-a-mixed-model-change-drastically-depending-o)

Now, within a normal range of orders of magnitude, the degrees of freedom do not change and the p-values remain constant. You approached this from the other direction in your example, noting that the numbers stopped changing after a certain point (when the numbers in the DV were sufficiently large). Here I show that they are stable using an example from the iris package included with R:

# Preparing data
d <- iris
d$width <- d$Sepal.Width
d$Species <- as.factor(d$Species)

# Creating slightly smaller versions of the DV
d$length <- d$Sepal.Length
d$length_10 <- d$Sepal.Length/10
d$length_1e2 <- d$Sepal.Length/1e2
d$length_1e3 <- d$Sepal.Length/1e3

# fitting the models
m1 <- lmer(length ~ width + (1|Species),data = d)
m2 <- lmer(length_10 ~ width + (1|Species),data = d)
m3 <- lmer(length_1e2 ~ width + (1|Species),data = d)
m4 <- lmer(length_1e3 ~ width + (1|Species),data = d)

# The coefficients are all the same
> summary(m1)$coefficients
             Estimate Std. Error         df  t value     Pr(>|t|)
(Intercept) 3.4061671  0.6683080   3.405002 5.096703 1.065543e-02
width       0.7971543  0.1062064 146.664820 7.505711 5.453404e-12
> summary(m2)$coefficients
              Estimate Std. Error         df  t value     Pr(>|t|)
(Intercept) 0.34061671 0.06683080   3.405002 5.096703 1.065543e-02
width       0.07971543 0.01062064 146.664820 7.505711 5.453404e-12
> summary(m3)$coefficients
               Estimate  Std. Error         df  t value     Pr(>|t|)
(Intercept) 0.034061671 0.006683079   3.405003 5.096703 1.065542e-02
width       0.007971543 0.001062064 146.664820 7.505711 5.453405e-12
> summary(m4)$coefficients
                Estimate   Std. Error         df  t value     Pr(>|t|)
(Intercept) 0.0034061671 0.0006683079   3.405003 5.096703 1.065542e-02
width       0.0007971543 0.0001062064 146.664820 7.505711 5.453405e-12

However, your numbers are much smaller than this, so I made much smaller versions of the DV to try and re-create your example. As you can see, the degrees of freedom start approaching zero which causes the p-values to move towards one.

# Much smaller numbers
d$length_1e6 <- d$Sepal.Length/1e6
d$length_1e7 <- d$Sepal.Length/1e7
d$length_1e8 <- d$Sepal.Length/1e8

# fitting the models
m5 <- lmer(length_1e6 ~ width + (1|Species),data = d)
m6 <- lmer(length_1e7 ~ width + (1|Species),data = d)
m7 <- lmer(length_1e8 ~ width + (1|Species),data = d)

# Here we recreate the problem
> summary(m5)$coefficients
                Estimate   Std. Error        df  t value  Pr(>|t|)
(Intercept) 3.406167e-06 6.683079e-07 0.5618686 5.096703 0.2522273
width       7.971543e-07 1.062064e-07 0.6730683 7.505711 0.1599534
> summary(m6)$coefficients
                Estimate   Std. Error         df  t value  Pr(>|t|)
(Intercept) 3.406167e-07 6.683080e-08 0.01224581 5.096703 0.9461743
width       7.971543e-08 1.062064e-08 0.01229056 7.505711 0.9415154
> summary(m7)$coefficients
                Estimate   Std. Error           df  t value  Pr(>|t|)
(Intercept) 3.406167e-08 6.683080e-09 0.0001784636 5.096703 0.9988162
width       7.971543e-09 1.062064e-09 0.0001784738 7.505711 0.9987471

A possible solution to this is to use another approximation method, Kenward-Roger. Let's take the model with the smallest transformation of the DV here. we can do that with the following code:

summary(m7, ddf="Kenward-Roger")$coefficients
                Estimate   Std. Error         df  t value     Pr(>|t|)
(Intercept) 3.406167e-08 6.687077e-09   3.408815 5.093656 1.064475e-02
width       7.971543e-09 1.064752e-09 146.666335 7.486759 6.053660e-12

As you can see, with this method the numbers from the smallest version of our transformation now match the stable numbers from the larger transformations. Understanding exactly why small numbers are a problem for the Satterthwaite method is beyond my understanding of the methods employed by lmerTest method, but I know at least one of them is on here and might be able to provide additional insight. I suspect it might be related to underflow, as your numbers are very small, but I can't be sure.

I hope this helps!

Upvotes: 1

Related Questions