Reputation: 17299
I am work on a grouped dataset and I want to add 4 summarizing statistics as 4 new columns: count, mean, ci lower, ci upper.
I summarized mean, ci lower, ci upper as following:
library(Hmisc)
library(dplyr)
# summarize count, mean, confidence intervals and make four new columns;
mtcars %>% group_by(vs, am) %>%
do(
as.data.frame(as.list(smean.cl.normal(.$mpg)))
)
# vs am Mean Lower Upper
# <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 0 0 15.05000 13.28723 16.81277
# 2 0 1 19.75000 15.54295 23.95705
# 3 1 0 20.74286 18.45750 23.02822
# 4 1 1 28.37143 23.97129 32.77157
however, when I add count, the new columns becomes 2 columns of lists:
df <- mtcars %>% group_by(vs, am) %>%
do(
n = length(.$mpg),
stats = smean.cl.normal(.$mpg)
)
# # A tibble: 4 × 4
# vs am n stats
# * <dbl> <dbl> <list> <list>
# 1 0 0 <int [1]> <dbl [3]>
# 2 0 1 <int [1]> <dbl [3]>
# 3 1 0 <int [1]> <dbl [3]>
# 4 1 1 <int [1]> <dbl [3]>
my desired output is:
# vs am n Mean Lower Upper
# <dbl> <dbl> <int> <dbl> <dbl> <dbl>
# 1 0 0 12 15.05000 13.28723 16.81277
# 2 0 1 6 19.75000 15.54295 23.95705
# 3 1 0 7 20.74286 18.45750 23.02822
# 4 1 1 7 28.37143 23.97129 32.77157
How should I achieve this conveniently?
Thanks in advance.
I also tried:
mtcars %>% group_by(vs, am) %>%
do(
as.data.frame(as.list(c(length(.$mpg), smean.cl.normal(.$mpg))))
)
# Source: local data frame [4 x 8]
# Groups: vs, am [4]
#
# vs am X12 Mean Lower Upper X6 X7
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 0 0 12 15.05000 13.28723 16.81277 NA NA
# 2 0 1 NA 19.75000 15.54295 23.95705 6 NA
# 3 1 0 NA 20.74286 18.45750 23.02822 NA 7
# 4 1 1 NA 28.37143 23.97129 32.77157 NA 7
This gives strange results.
Upvotes: 0
Views: 102
Reputation: 8072
You can accomplish this without do
using multiple tidyverse
packages, namely tidyr
, dplyr
, purrr
and broom
.
The reason behind this is that do
will eventually be replaced by purrr
It does:
You do need to do some finagling to get the smean.cl.normal
in the proper form in step 3. My approach was transform the output into a tidy data frame with broom::tidy
then tidyr::spread
the rows into columns. The its in the proper tidy form for each vs/am group. This approach can probably be improved and hope those suggestions would be posted in comments.
library(Hmisc)
library(tidyverse)
mtcars %>%
group_by(vs, am) %>%
nest(mpg) %>%
mutate(stats = map(data, ~spread(tidy(smean.cl.normal(.x$mpg)), names, x)),
n = map(data, nrow)) %>%
unnest(stats, n) %>%
select(-data)
Upvotes: 1