Sundown Brownbear
Sundown Brownbear

Reputation: 571

ANOVA F statistic value in dplyr group by

I can summarize the mean by groups using

 t(mtcars %>%
    group_by(gear) %>%
    dplyr::summarize(Mean_Mpg = mean(mpg, na.rm=TRUE),
                 StdD_Mpg = sd(mpg, na.rm=TRUE)

                 ))

    gear      3         4         5
    Mean_Mpg  16.106667 24.533333 21.380000
    StdD_Mpg  3.371618  5.276764  6.658979

I know summary(aov(gear ~ mpg , mtcars)) will output the results from ANOVA test includign the F Statistic.

             Df Sum Sq  Mean Sq F value Pr(>F)   
mpg          1  3.893   3.893   8.995 0.0054 **
Residuals    30 12.982  0.433                  

Also chisq.test(table(mtcars$gear,mtcars$carb)) will output the results from Chi.Square test.

Pearson's Chi-squared test

    X-squared = 16.518, df = 10, p-value = 0.08573 

What I am trying to do is produce an output like this below, where I am combining the mean, standard deviation and F Statistic value from ANOVA, X-Squared test statistic.

     gear            3         4         5          Test-Statistic   Test
    Mpg (Mean)       16.106667 24.533333 21.380000   8.995           ANOVA
        (StdD)       3.371618  5.276764  6.658979
    Carb(N)                                          16.518          Chi.Square
                     3         4         0
                     4         4         2
                     3         0         0
                     5         4         1
                     0         0         1
                     0         0         1

I am not sure how to do put together a table like this this by combining the mean,standard deviation, F Statistic, Chiq.Square statistic values etc. I would welcome any help from the community on formatting the results like this.

Upvotes: 0

Views: 2421

Answers (1)

demarsylvain
demarsylvain

Reputation: 2185

One option is to think about all the results you want, and how to manipulate them in order to have a same structure. Then, use bind_rows() for instance, to gather all results in a same table.

The functions group_by() and summarise() able to calculate mean (and others) for severals variables (and the result is a data.frame), whereas the function apply() allow to apply a same function, or a combinaison of functions (like summary(aov(...))) to several variables. The result of the second is a vector.

library(tidyverse)

  # mean (± sd) of x per group
mtcars %>%
  group_by(gear) %>%
  summarise_at(
    vars(mpg, carb),
    funs(paste0(round(mean(.), 2), '(±', round(sd(.) / sqrt(n()), 1), ')'))
  ) %>% 
  mutate(gear = as.character(gear)) %>% 

  # add ANOVA: gear ~ x
  bind_rows(
    c(gear = 'ANOVA',
      apply(mtcars %>% select(mpg, carb), 2, 
            function(x) summary(aov(mtcars$gear ~ x))[[1]]$`F value`[1] %>% round(3) %>% as.character()
      ))
  ) %>% 

  # add Chi-Square: gear ~ x
  bind_rows(
    c(gear = 'CHI-SQUARE',
      apply(mtcars %>% select(mpg, carb), 2, 
            function(x) chisq.test(table(mtcars$gear, x))$statistic %>% round(3) %>% as.character()
      ))
  )

# # A tibble: 5 x 3
#   gear       mpg         carb      
#   <chr>      <chr>       <chr>     
# 1 3          16.11(±0.9) 2.67(±0.3)
# 2 4          24.53(±1.5) 2.33(±0.4)
# 3 5          21.38(±3)   4.4(±1.2) 
# 4 ANOVA      8.995       2.436     
# 5 CHI-SQUARE 54.667      16.518

Upvotes: 1

Related Questions