Reputation: 113
Imagine I want to assess the normality of several variables from a given dataset using the Shapiro-Wilk test.
In the example data, I also want to group by species.
data(iris)
library(dplyr)
library(purrr)
iris %>%
select(Sepal.Length, Sepal.Width)%>%
group_by(iris$Species)%>%
lapply(. , shapiro.test)
This gives me the error:
Error in FUN(X[[i]], ...) : is.numeric(x) is not TRUE
I guess this error happens because lapply is computing the three variables as objects to apply the test, instead of grouping by species, and since Species is not numeric, it´s unable to compute it.
Any help would be appreciate it.
Upvotes: 0
Views: 107
Reputation: 388907
You can use dplyr
functions to calculate here.
To apply shapiro.test
to each of Sepal.Length
and Sepal.Width
library(dplyr)
iris %>%
select(Sepal.Length, Sepal.Width, Species) %>%
group_by(Species) %>%
summarise(across(.fns = ~list(shapiro.test(.)))) -> result
result
# Species Sepal.Length Sepal.Width
# <fct> <list> <list>
#1 setosa <htest> <htest>
#2 versicolor <htest> <htest>
#3 virginica <htest> <htest>
To get the p-value you can do :
iris %>%
select(Sepal.Length, Sepal.Width, Species) %>%
group_by(Species) %>%
summarise(across(.fns = ~shapiro.test(.)$p.value)) -> result
result
# Species Sepal.Length Sepal.Width
# <fct> <dbl> <dbl>
#1 setosa 0.460 0.272
#2 versicolor 0.465 0.338
#3 virginica 0.258 0.181
Upvotes: 1