Reputation: 486
In the following example I execute a power analysis on the following dataset:
hh <- data.frame(Species=c(rep("SpA", 7),rep("SpB", 5),rep("SpC", 14),rep("SpD", 10),rep("SpE", 1)),
Skull.length=c(13.100, 14.700, 14.200, 15.400, 15.300, 15.100, 15.200, 11.100, 11.500, 12.900, 12.500, 12.400, 12.700, 12.100, 13.200, 12.300, 11.335, 12.900, 12.500, 13.190, 12.900, 14.400, 14.400, 14.300, 14.100, 14.300, 12.600, 12.900, 12.900, 14.260, 13.670, 14.720, 14.440, 14.440, 15.350, 14.970, 10.300),
Spine.length=c(59.200, 60.100, 60.600, 67.010, 70.000, 70.300, 70.800, 53.300, 53.800, 54.200, 54.300, 56.900, 55.300, 56.600, 57.800, 57.800, 58.365, 59.900, 60.000, 60.100, 60.200, 62.900, 63.600, 63.700, 66.200, 66.700, 55.300, 55.500, 59.300, 59.740, 61.330, 65.400, 65.600, 65.800, 66.650, 68.030, 52.100))
I will need these packages:
library(lme4)
library(lmerTest) # a pimped-up version of lme4 which also provides pseudo-p-values.
library(MuMIn) # gives pseudo-R-squared via r.squaredGLMM()
library(pwr) # power analysis for lm
library(simr) # power analysis for generalized linear mixed models by simulation
If I were to test the correlation between Skull.length
and Spine.length
ignoring the role of Species
I would do:
lm1 <- lm(Skull.length~Spine.length, data=hh)
summary(lm1)$adj.r.squared # 0.7696584
Then a power analysis to test if my sample size is large enough would be easy with package pwr
:
p.out <- pwr.r.test(r = sqrt(summary(lm1)$adj.r.squared), sig.level = 0.05, power = 0.8, alternative = "greater")
# To detect r = 0.8773018 or greater with sig.level = 0.05 and power = 0.8, n >= 6 is required
But I want to take into account hh$Species
as in the model below:
mem.skull.vs.body <- glmer(Skull.length ~ Spine.length + (1| Species),
data=hh,
family="gaussian")
Which produces:
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 0.73958 1.32239 23.50147 0.559 0.581
Spine.length 0.20848 0.02173 22.72726 9.593 1.87e-09 ***
[Data and linear regression with parameters from model mem.skull.vs.body
]
The slope of my model, 0.20848
, is my measure of the effect size. To find out the sample size required to detect an effect size of at least 0.1:
fixef(mem.skull.vs.body)["Spine.length"] <- 0.1
powerSim(mem.skull.vs.body, nsim=1000)
Which gives:
Power for predictor 'Spine.length', (95% confidence interval):
98.90% (98.04, 99.45)
This suggests that my sample size (37 individuals, each from one of five species) is plenty for the model I am testing, but when I proceeded to double-check with powerCurve(mem.skull.vs.body, nsim=1000)
I obtained:
Power for predictor 'Spine.length', (95% confidence interval),
by largest value of Spine.length:
53.8: 0.00% ( 0.00, 0.37) - 3 rows
55.3: 5.40% ( 4.08, 6.99) - 7 rows
57.8: 5.20% ( 3.91, 6.76) - 12 rows
59.3: 12.30% (10.33, 14.50) - 15 rows
60.1: 21.50% (18.99, 24.18) - 20 rows
61.33: 30.60% (27.75, 33.56) - 23 rows
65.4: 61.40% (58.30, 64.43) - 27 rows
66.2: 80.00% (77.38, 82.44) - 30 rows
68.03: 94.80% (93.24, 96.09) - 34 rows
70.8: 98.40% (97.41, 99.08) - 37 rows
Here is a graph for the values above:
I find this output confusing if not suspicious, because:
powerSim()
;hh$Spine.length
, which are between 52.1 and 70.8.It looks very much like function powerCurve
in its default setting is confusing the size of x values with the sample size. Is there a way to change the setting of powerCurve
to avoid such confusion?
UPDATE (April 2019):
Since I asked this question, the package developers have modified function powerCurve
to reflect the explanations provided below by pete.
Upvotes: 2
Views: 3052
Reputation: 161
I guess it might be better to complement pete's answer a little more on the confused explanation. In Marco Plebani's simulation, the expansion is along "hh$Spine.length", i.e., the "66.2" cannot be understood as the sample size, but the spine's length. In pete's simulation, the value of hh$obs corresponds to number of samples. To get the 80% power corresponding sample size, we can refine pete's solution a little bit more:
mem.skull.vs.body2 <- update(mem.skull.vs.body, control=lmerControl(check.conv.singular = .makeCC(action = "ignore", tol = 1e-4))) #disable singular warning message
powerCurve(mem.skull.vs.body2, along="obs", breaks=c(22, 23, 24, 25,26,27))
Calculating power at 10 sample sizes along Spine.length
Power for predictor 'Spine.length', (95% confidence interval),==========================================================|
by largest value of Spine.length:
53.8: 0.00% ( 0.00, 0.37) - 3 rows
55.3: 9.90% ( 8.12, 11.92) - 7 rows
57.8: 18.90% (16.52, 21.47) - 12 rows
59.3: 48.60% (45.46, 51.75) - 15 rows
60.1: 78.30% (75.61, 80.82) - 20 rows
61.33: 92.90% (91.13, 94.41) - 23 rows
65.4: 99.50% (98.84, 99.84) - 27 rows
66.2: 100.0% (99.63, 100.0) - 30 rows
68.03: 100.0% (99.63, 100.0) - 34 rows
70.8: 100.0% (99.63, 100.0) - 37 rows
I don't know why my simulated results differ much with pete's. I switched to
powerCurve(mem.skull.vs.body, along="obs", breaks=c(14,16,17,18,20,22))
Calculating power at 6 sample sizes along obs
Power for predictor 'Spine.length', (95% confidence interval),==========================================================|
by largest value of obs:
14: 47.00% (36.94, 57.24) - 14 rows
16: 61.00% (50.73, 70.60) - 16 rows
17: 83.00% (74.18, 89.77) - 17 rows
18: 98.00% (92.96, 99.76) - 18 rows
20: 100.0% (96.38, 100.0) - 20 rows
22: 100.0% (96.38, 100.0) - 22 rows
Then it seems 17 samples suffice to give >=80% power. To verify with 17 samples.
library(dplyr)
hh17 <- sample_n(hh, size=17, replace=F)
model17 <- lmer(Skull.length ~ Spine.length + (1| Species), data=hh17)
powerSim(model17,nsim=100)
Power for predictor 'Spine.length', (95% confidence interval):==========================================================|
97.00% (91.48, 99.38)
Test: unknown test
Effect size for Spine.length is 0.17
Based on 100 simulations, (0 warnings, 0 errors)
alpha = 0.05, nrow = 17
Time elapsed: 0 h 0 m 9 s
The sampled result above seems over-optimized. With more simulations, 17 samples might suffice to give more than 80% power.
Upvotes: 0
Reputation: 2396
powerCurve
takes an along
argument that defaults to the first fixed covariate. Not all variables make sense, as can be seen in this example.
In this case you might add an "observation" variable and run a power curve along that:
hh$obs <- 1:37
pc <- powerCurve(mem.skull.vs.body, along="obs")
Then plot(pc)
will give a more intuitive result.
If you want even more control over the plot, I would recommend using summary
to get the raw numbers and then plot them as you see fit. Note that the nrow
column is only currently available in the github version (or in versions > 1.0.5 if you're reading this in the future).
summary(pc)
# nrow nlevels successes trials mean lower upper
# 1 3 3 0 100 0.00 0.0000000 0.03621669
# 2 7 7 0 100 0.00 0.0000000 0.03621669
# 3 11 11 9 100 0.09 0.0419836 0.16398226
# 4 14 14 18 100 0.18 0.1103112 0.26947709
# 5 18 18 32 100 0.32 0.2302199 0.42076686
# 6 22 22 67 100 0.67 0.5688272 0.76080147
# 7 26 26 90 100 0.90 0.8237774 0.95099531
# 8 29 29 91 100 0.91 0.8360177 0.95801640
# 9 33 33 98 100 0.98 0.9296161 0.99756866
# 10 37 37 98 100 0.98 0.9296161 0.99756866
Upvotes: 2