mt1022
mt1022

Reputation: 17299

how to summarize grouped data with different functions that return different numbers of return values?

I am work on a grouped dataset and I want to add 4 summarizing statistics as 4 new columns: count, mean, ci lower, ci upper.

I summarized mean, ci lower, ci upper as following:

library(Hmisc)
library(dplyr)

# summarize count, mean, confidence intervals and make four new columns;
mtcars %>% group_by(vs, am) %>%
    do(
        as.data.frame(as.list(smean.cl.normal(.$mpg)))
    )
#      vs    am     Mean    Lower    Upper
#   <dbl> <dbl>    <dbl>    <dbl>    <dbl>
# 1     0     0 15.05000 13.28723 16.81277
# 2     0     1 19.75000 15.54295 23.95705
# 3     1     0 20.74286 18.45750 23.02822
# 4     1     1 28.37143 23.97129 32.77157

however, when I add count, the new columns becomes 2 columns of lists:

df <- mtcars %>% group_by(vs, am) %>%
    do(
        n = length(.$mpg),
        stats = smean.cl.normal(.$mpg)
    )

# # A tibble: 4 × 4
#      vs    am         n     stats
# * <dbl> <dbl>    <list>    <list>
# 1     0     0 <int [1]> <dbl [3]>
# 2     0     1 <int [1]> <dbl [3]>
# 3     1     0 <int [1]> <dbl [3]>
# 4     1     1 <int [1]> <dbl [3]>

my desired output is:

#      vs    am     n     Mean    Lower    Upper
#   <dbl> <dbl> <int>    <dbl>    <dbl>    <dbl>
# 1     0     0    12 15.05000 13.28723 16.81277
# 2     0     1     6 19.75000 15.54295 23.95705
# 3     1     0     7 20.74286 18.45750 23.02822
# 4     1     1     7 28.37143 23.97129 32.77157

How should I achieve this conveniently?

Thanks in advance.


I also tried:

mtcars %>% group_by(vs, am) %>%
    do(
        as.data.frame(as.list(c(length(.$mpg), smean.cl.normal(.$mpg))))
    )

# Source: local data frame [4 x 8]
# Groups: vs, am [4]
# 
# vs    am   X12     Mean    Lower    Upper    X6    X7
# <dbl> <dbl> <dbl>    <dbl>    <dbl>    <dbl> <dbl> <dbl>
# 1     0     0    12 15.05000 13.28723 16.81277    NA    NA
# 2     0     1    NA 19.75000 15.54295 23.95705     6    NA
# 3     1     0    NA 20.74286 18.45750 23.02822    NA     7
# 4     1     1    NA 28.37143 23.97129 32.77157    NA     7

This gives strange results.

Upvotes: 0

Views: 102

Answers (1)

Jake Kaupp
Jake Kaupp

Reputation: 8072

You can accomplish this without do using multiple tidyverse packages, namely tidyr, dplyr, purrr and broom.

The reason behind this is that do will eventually be replaced by purrr

It does:

  1. group by vs and am
  2. nest mpg into a list-frame.
  3. create the stats column and n column as a list-frame.
  4. unnest the list frames into separate rows and columns.
  5. drop the data list frame.

You do need to do some finagling to get the smean.cl.normal in the proper form in step 3. My approach was transform the output into a tidy data frame with broom::tidy then tidyr::spread the rows into columns. The its in the proper tidy form for each vs/am group. This approach can probably be improved and hope those suggestions would be posted in comments.

library(Hmisc)
library(tidyverse)

mtcars %>% 
  group_by(vs, am) %>% 
  nest(mpg) %>% 
  mutate(stats = map(data, ~spread(tidy(smean.cl.normal(.x$mpg)), names, x)),
         n = map(data, nrow)) %>% 
  unnest(stats, n) %>% 
  select(-data) 

Upvotes: 1

Related Questions