Power analysis with powerCurve (package simr) gives confusing output

Question

In the following example I execute a power analysis on the following dataset:

hh <- data.frame(Species=c(rep("SpA", 7),rep("SpB", 5),rep("SpC", 14),rep("SpD", 10),rep("SpE", 1)),
    Skull.length=c(13.100, 14.700, 14.200, 15.400, 15.300, 15.100, 15.200, 11.100, 11.500, 12.900, 12.500, 12.400, 12.700, 12.100, 13.200, 12.300, 11.335, 12.900, 12.500, 13.190, 12.900, 14.400, 14.400, 14.300, 14.100, 14.300, 12.600, 12.900, 12.900, 14.260, 13.670, 14.720, 14.440, 14.440, 15.350, 14.970, 10.300),
    Spine.length=c(59.200, 60.100, 60.600, 67.010, 70.000, 70.300, 70.800, 53.300, 53.800, 54.200, 54.300, 56.900, 55.300, 56.600, 57.800, 57.800, 58.365, 59.900, 60.000, 60.100, 60.200, 62.900, 63.600, 63.700, 66.200, 66.700, 55.300, 55.500, 59.300, 59.740, 61.330, 65.400, 65.600, 65.800, 66.650, 68.030, 52.100))

I will need these packages:

library(lme4)
library(lmerTest) # a pimped-up version of lme4 which also provides pseudo-p-values.
library(MuMIn) # gives pseudo-R-squared via r.squaredGLMM()
library(pwr) # power analysis for lm
library(simr) # power analysis for generalized linear mixed models by simulation

If I were to test the correlation between Skull.length and Spine.length ignoring the role of Species I would do:

lm1 <- lm(Skull.length~Spine.length, data=hh)
summary(lm1)$adj.r.squared # 0.7696584

Then a power analysis to test if my sample size is large enough would be easy with package pwr:

p.out <- pwr.r.test(r = sqrt(summary(lm1)$adj.r.squared), sig.level = 0.05, power = 0.8, alternative = "greater")
# To detect r = 0.8773018 or greater with sig.level = 0.05 and power = 0.8, n >= 6 is required

But I want to take into account hh$Species as in the model below:

mem.skull.vs.body <- glmer(Skull.length ~ Spine.length + (1| Species),
                            data=hh,
                            family="gaussian")

Which produces:

Fixed effects:
             Estimate Std. Error       df t value Pr(>|t|)    
(Intercept)   0.73958    1.32239 23.50147   0.559    0.581    
Spine.length  0.20848    0.02173 22.72726   9.593 1.87e-09 ***

[Data and linear regression with parameters from model mem.skull.vs.body]

The slope of my model, 0.20848, is my measure of the effect size. To find out the sample size required to detect an effect size of at least 0.1:

fixef(mem.skull.vs.body)["Spine.length"] <- 0.1
powerSim(mem.skull.vs.body, nsim=1000)

Which gives:

Power for predictor 'Spine.length', (95% confidence interval):
      98.90% (98.04, 99.45)

This suggests that my sample size (37 individuals, each from one of five species) is plenty for the model I am testing, but when I proceeded to double-check with powerCurve(mem.skull.vs.body, nsim=1000) I obtained:

Power for predictor 'Spine.length', (95% confidence interval),
by largest value of Spine.length:
   53.8:  0.00% ( 0.00,  0.37) - 3 rows
   55.3:  5.40% ( 4.08,  6.99) - 7 rows
   57.8:  5.20% ( 3.91,  6.76) - 12 rows
   59.3: 12.30% (10.33, 14.50) - 15 rows
   60.1: 21.50% (18.99, 24.18) - 20 rows
  61.33: 30.60% (27.75, 33.56) - 23 rows
   65.4: 61.40% (58.30, 64.43) - 27 rows
   66.2: 80.00% (77.38, 82.44) - 30 rows
  68.03: 94.80% (93.24, 96.09) - 34 rows
   70.8: 98.40% (97.41, 99.08) - 37 rows

Here is a graph for the values above:

I find this output confusing if not suspicious, because:

it suggests that I would need a sample of >65 observations to have 80% chances of detecting an effect size of 0.1, in contrast with the estimates from powerSim();
the range of values of the x axis is very close to the range of values assumed by hh$Spine.length, which are between 52.1 and 70.8.

It looks very much like function powerCurve in its default setting is confusing the size of x values with the sample size. Is there a way to change the setting of powerCurveto avoid such confusion?

UPDATE (April 2019):

Since I asked this question, the package developers have modified function powerCurve to reflect the explanations provided below by pete.

pete · Accepted Answer

powerCurve takes an along argument that defaults to the first fixed covariate. Not all variables make sense, as can be seen in this example.

In this case you might add an "observation" variable and run a power curve along that:

hh$obs <- 1:37
pc <- powerCurve(mem.skull.vs.body, along="obs")

Then plot(pc) will give a more intuitive result.

If you want even more control over the plot, I would recommend using summary to get the raw numbers and then plot them as you see fit. Note that the nrow column is only currently available in the github version (or in versions > 1.0.5 if you're reading this in the future).

summary(pc)
#    nrow nlevels successes trials mean     lower      upper
# 1     3       3         0    100 0.00 0.0000000 0.03621669
# 2     7       7         0    100 0.00 0.0000000 0.03621669
# 3    11      11         9    100 0.09 0.0419836 0.16398226
# 4    14      14        18    100 0.18 0.1103112 0.26947709
# 5    18      18        32    100 0.32 0.2302199 0.42076686
# 6    22      22        67    100 0.67 0.5688272 0.76080147
# 7    26      26        90    100 0.90 0.8237774 0.95099531
# 8    29      29        91    100 0.91 0.8360177 0.95801640
# 9    33      33        98    100 0.98 0.9296161 0.99756866
# 10   37      37        98    100 0.98 0.9296161 0.99756866

Power analysis with powerCurve (package simr) gives confusing output

Answers (2)

Related Questions