Brandon Jablon
Brandon Jablon

Reputation: 21

How to preform shapiro test with group by function

Type <- c("Bark", "Redwood", "Oak")
size <- c(10,15,13)
width <- c(3,4,5)
Ratio <- size/width
df <- data.frame(Type, size, width, Ratio)
mutate(df, ratio_log = log10(Ratio))
df %>% group_by(Type) %>% shapiro.test(ratio_log)

Error in shapiro.test(., ratio_log) : unused argument (ratio_log)

I am attempting to apply the Shapiro test for all of the types, e.g, bark, redwood, oak. not all the ratios combined. I have a larger data set that consists of more ratios.

Upvotes: 1

Views: 7807

Answers (2)

Lodewic Van Twillert
Lodewic Van Twillert

Reputation: 782

You need tidyverse for purrr and dplyr at least.

And I made more samples in the example since you need a vector for shapiro.test and not a single ratio. So here is 100 samples from a normal, a binomial and a uniform distribution.

library(tidyverse)

Type <- c("Bark", "Redwood", "Oak")
size <- c(10,15,13)
width <- c(3,4,5)
Ratio <- c(rnorm(100),
           rbinom(100, size = 2, prob = 0.2),
           runif(100))

Put these in a data.frame

# Need minimum sample size for shapiro test
df <- data.frame(Type = rep(Type, each = 100),
                 Size = rep(size, each = 100),
                 width = rep(size, each = 100),
                 Ratio)

Then you can use the ratio_log, in this case I took the liberty of just using the same ratio. You can group by Type and use nest to nest a data.frame of the data per group.

df %>%
  mutate(ratio_log = Ratio) %>%
  group_by(Type) %>%
  mutate(N_Samples = n()) %>%
  nest()

# A tibble: 3 x 2
  Type    data              
  <fct>   <list>            
1 Bark    <tibble [100 x 5]>
2 Redwood <tibble [100 x 5]>
3 Oak     <tibble [100 x 5]>

You can then use the map function together with mutate to basically do lapply applied to the nested data.frames (or tibbles, same thing essentially here.) To each data.frame per group we apply the shapiro.test function to the values in the ratio_log column.

# Use purrr::nest and purrr::map to do shapiro tests per group
df.shapiro <- df %>%
  mutate(ratio_log = Ratio) %>%
  group_by(Type) %>%
  mutate(N_Samples = n()) %>%
  nest() %>%
  mutate(Shapiro = map(data, ~ shapiro.test(.x$ratio_log)))


# A tibble: 3 x 3
  Type    data               Shapiro    
  <fct>   <list>             <list>     
1 Bark    <tibble [100 x 5]> <S3: htest>
2 Redwood <tibble [100 x 5]> <S3: htest>
3 Oak     <tibble [100 x 5]> <S3: htest>

Now you have nested shapiro.test results, applied to each group.

To get the relevant parameters you can use glance from the broom package. Then unnest the result from the glance function.

# Use broom::glance and purrr::unnest to get all relevant statistics
library(broom)
df.shapiro.glance <- df.shapiro %>%
  mutate(glance_shapiro = Shapiro %>% map(glance)) %>%
  unnest(glance_shapiro)

 Type    data               Shapiro     statistic  p.value method                     
  <fct>   <list>             <list>          <dbl>    <dbl> <fct>                      
1 Bark    <tibble [100 x 5]> <S3: htest>     0.967 1.30e- 2 Shapiro-Wilk normality test
2 Redwood <tibble [100 x 5]> <S3: htest>     0.638 2.45e-14 Shapiro-Wilk normality test
3 Oak     <tibble [100 x 5]> <S3: htest>     0.937 1.31e- 4 Shapiro-Wilk normality test

Upvotes: 7

Chris
Chris

Reputation: 1615

library(dplyr)

Type <- c("Bark", "Redwood", "Oak")
size <- c(10,15,13)
width <- c(3,4,5)
Ratio <- size/width
df <- data.frame(Type, size, width, Ratio)

df %>% 
  mutate(ratio_log = log10(Ratio)) %>% 
  group_by(Type) %>% 
  summarise(results = data_frame(shapiro.test(.$ratio_log)))

You an also see other solutions here: purrr map a t.test onto a split df

Upvotes: 0

Related Questions