SharkCaller
SharkCaller

Reputation: 113

Perform several hypothesis test at the same time (r)

Imagine I want to assess the normality of several variables from a given dataset using the Shapiro-Wilk test.

In the example data, I also want to group by species.

data(iris)
library(dplyr)
library(purrr)

iris %>% 
  select(Sepal.Length, Sepal.Width)%>%
  group_by(iris$Species)%>%
  lapply(. , shapiro.test)

This gives me the error:

Error in FUN(X[[i]], ...) : is.numeric(x) is not TRUE

I guess this error happens because lapply is computing the three variables as objects to apply the test, instead of grouping by species, and since Species is not numeric, it´s unable to compute it.

Any help would be appreciate it.

Upvotes: 0

Views: 107

Answers (1)

Ronak Shah
Ronak Shah

Reputation: 388907

You can use dplyr functions to calculate here.

To apply shapiro.test to each of Sepal.Length and Sepal.Width

library(dplyr)

iris %>% 
  select(Sepal.Length, Sepal.Width, Species) %>%
  group_by(Species) %>%
  summarise(across(.fns = ~list(shapiro.test(.)))) -> result
result
  
#  Species    Sepal.Length Sepal.Width
#  <fct>      <list>       <list>     
#1 setosa     <htest>      <htest>    
#2 versicolor <htest>      <htest>    
#3 virginica  <htest>      <htest>    

To get the p-value you can do :

iris %>% 
  select(Sepal.Length, Sepal.Width, Species) %>%
  group_by(Species) %>%
  summarise(across(.fns = ~shapiro.test(.)$p.value)) -> result
result

#  Species    Sepal.Length Sepal.Width
#  <fct>             <dbl>       <dbl>
#1 setosa            0.460       0.272
#2 versicolor        0.465       0.338
#3 virginica         0.258       0.181

Upvotes: 1

Related Questions