how to summarize grouped data with different functions that return different numbers of return values?

Question

I am work on a grouped dataset and I want to add 4 summarizing statistics as 4 new columns: count, mean, ci lower, ci upper.

I summarized mean, ci lower, ci upper as following:

library(Hmisc)
library(dplyr)

# summarize count, mean, confidence intervals and make four new columns;
mtcars %>% group_by(vs, am) %>%
    do(
        as.data.frame(as.list(smean.cl.normal(.$mpg)))
    )
#      vs    am     Mean    Lower    Upper
#                
# 1     0     0 15.05000 13.28723 16.81277
# 2     0     1 19.75000 15.54295 23.95705
# 3     1     0 20.74286 18.45750 23.02822
# 4     1     1 28.37143 23.97129 32.77157

however, when I add count, the new columns becomes 2 columns of lists:

df <- mtcars %>% group_by(vs, am) %>%
    do(
        n = length(.$mpg),
        stats = smean.cl.normal(.$mpg)
    )

# # A tibble: 4 × 4
#      vs    am         n     stats
# *          
# 1     0     0  
# 2     0     1  
# 3     1     0  
# 4     1     1

my desired output is:

#      vs    am     n     Mean    Lower    Upper
#                 
# 1     0     0    12 15.05000 13.28723 16.81277
# 2     0     1     6 19.75000 15.54295 23.95705
# 3     1     0     7 20.74286 18.45750 23.02822
# 4     1     1     7 28.37143 23.97129 32.77157

How should I achieve this conveniently?

Thanks in advance.

I also tried:

mtcars %>% group_by(vs, am) %>%
    do(
        as.data.frame(as.list(c(length(.$mpg), smean.cl.normal(.$mpg))))
    )

# Source: local data frame [4 x 8]
# Groups: vs, am [4]
# 
# vs    am   X12     Mean    Lower    Upper    X6    X7
#                 
# 1     0     0    12 15.05000 13.28723 16.81277    NA    NA
# 2     0     1    NA 19.75000 15.54295 23.95705     6    NA
# 3     1     0    NA 20.74286 18.45750 23.02822    NA     7
# 4     1     1    NA 28.37143 23.97129 32.77157    NA     7

This gives strange results.

Jake Kaupp · Accepted Answer

You can accomplish this without do using multiple tidyverse packages, namely tidyr, dplyr, purrr and broom.

The reason behind this is that do will eventually be replaced by purrr

It does:

group by vs and am
nest mpg into a list-frame.
create the stats column and n column as a list-frame.
unnest the list frames into separate rows and columns.
drop the data list frame.

You do need to do some finagling to get the smean.cl.normal in the proper form in step 3. My approach was transform the output into a tidy data frame with broom::tidy then tidyr::spread the rows into columns. The its in the proper tidy form for each vs/am group. This approach can probably be improved and hope those suggestions would be posted in comments.

library(Hmisc)
library(tidyverse)

mtcars %>% 
  group_by(vs, am) %>% 
  nest(mpg) %>% 
  mutate(stats = map(data, ~spread(tidy(smean.cl.normal(.x$mpg)), names, x)),
         n = map(data, nrow)) %>% 
  unnest(stats, n) %>% 
  select(-data)

how to summarize grouped data with different functions that return different numbers of return values?

Answers (1)

Related Questions