Reputation: 10996
I have data where I want to get a bunch of summary statistics for multiple columns with the tidyverse approach. However, utilizing tidyverse's summarize
function, it will create each column statistic as a new column, whereas I would prefer to see the column names as rows and each statistic as a new column. So my question is:
Is there a more elegant (and I know "elegant" is a vague term) way to achieve this than by accompanying the summarize
function with a pivot_longer
and pivot_wider
?
I'm using the latest dev versions of the tidyverse package, i.e. dplyr 0.8.99.9003 and tidyr 1.1.0. So it's fine if any solution requires new functions from these packages that are not yet on CRAN.
library(tidyverse)
dat <- as.data.frame(matrix(1:100, ncol = 5))
dat %>%
summarize(across(everything(), list(mean = mean,
sum = sum))) %>%
pivot_longer(cols = everything(),
names_sep = "_",
names_to = c("variable", "statistic")) %>%
pivot_wider(names_from = "statistic")
Expected outcome:
# A tibble: 5 x 3
variable mean sum
<chr> <dbl> <dbl>
1 V1 10.5 210
2 V2 30.5 610
3 V3 50.5 1010
4 V4 70.5 1410
5 V5 90.5 1810
Note: I'm not set on the name of any of the columns, so if there's a nice way to get the structure of the table with different/generic names, that'd also be fine.
Upvotes: 6
Views: 3190
Reputation: 35554
You can first stack all columns together and summarise by group.
dat %>%
pivot_longer(everything()) %>%
group_by(name) %>%
summarise_at("value", list(~mean(.), ~sum(.)))
# # A tibble: 5 x 3
# name mean sum
# <chr> <dbl> <int>
# 1 V1 10.5 210
# 2 V2 30.5 610
# 3 V3 50.5 1010
# 4 V4 70.5 1410
# 5 V5 90.5 1810
Upvotes: 2
Reputation: 388797
You can skip the pivot_wider
step by using ".value"
in names_to
.
library(dplyr)
dat %>%
summarise_all(list(mean = mean,sum = sum)) %>%
tidyr::pivot_longer(cols = everything(),
names_sep = "_",
names_to = c("variable", ".value"))
# A tibble: 5 x 3
# variable mean sum
# <chr> <dbl> <int>
#1 V1 10.5 210
#2 V2 30.5 610
#3 V3 50.5 1010
#4 V4 70.5 1410
#5 V5 90.5 1810
Upvotes: 5
Reputation: 27732
not a tidyverse
solution, but a data.table
one instead.. also, not sure if it is more 'elegant' ;-)
but here you go...
library( data.table )
#make 'dat' a data.table
setDT(dat)
#transpose, keeping column names
dat <- transpose(dat, keep.names = "var_name" )
#melt to long and summarise
melt(dat, id.vars = "var_name")[, .(mean = mean(value), sum = sum(value) ), by = var_name]
# var_name mean sum
# 1: V1 10.5 210
# 2: V2 30.5 610
# 3: V3 50.5 1010
# 4: V4 70.5 1410
# 5: V5 90.5 1810
Upvotes: 4