Reputation: 21
I am working on a large dataset of offspring sex ratio from +36,000 individuals of over 1,000 species. I want to see if the median sex ratio of each species significantly differs from .5. I am using a one-sample wilcoxon to do this. Here is an example dataset:
n<-100
dat<-data.frame(species=rep(LETTERS[1:5],n/5), SR=sample((1:100)/100,n,replace=TRUE))
When I run the following code, I get results where all p-values are the same.
library(dyplr)
res <- dat %>% group_by(species) %>%
do(w=wilcox.test(dat$SR,mu=.5,alternative=("two.sided"))) %>%
summarize(species,wilcox=w$p.value)
res
#OUTPUT#
# # A tibble: 5 x 2
species wilcox
<chr> <dbl>
1 A 0.465
2 B 0.465
3 C 0.465
4 D 0.465
5 E 0.465
Any idea what I'm doing wrong and how I can fix this?
Upvotes: 2
Views: 131
Reputation: 440
The function do()
is superseded and should not be used anymore. You can do the same within summarize()
with across()
.
First you just group by species
then you use across()
within summarize()
to access the values for each group and calculate the wilcoxon test and directly extract its p-value with $p.value
at the end of the expression.
Mind that I set exact = FALSE
to prevent the calculation of exact p-values as the sample is to small and it otherwise generates a warning. For your real data you can exclude this statement if your data sample is larger. For more information see this information.
n<-100
dat<-data.frame(species=rep(LETTERS[1:5],n/5), SR=sample((1:100)/100,n,replace=TRUE))
library(dplyr)
dat %>%
group_by(species) %>%
summarize(wilcox = across(SR,
~wilcox.test(.,
mu=.5,
alternative=("two.sided"),
exact = FALSE)$p.value)$SR)
#> # A tibble: 5 × 2
#> species wilcox$SR
#> <chr> <dbl>
#> 1 A 0.737
#> 2 B 0.0105
#> 3 C 0.751
#> 4 D 0.380
#> 5 E 0.614
Created on 2022-08-19 with reprex v2.0.2
Upvotes: 0