Reputation: 2602
I need to summarize
a data.frame across multiple columns in a generic way:
summarize
operation is easy, e.g. a simple median, and is straightforward;summarize
then includes a condition on another column, e.g. taking the value where these is a minimum (by group) in another column:set.seed(4)
myDF = data.frame(i = rep(1:3, each=3),
j = rnorm(9),
a = sample.int(9),
b = sample.int(9),
c = sample.int(9),
d = 'foo')
# i j a b c d
# 1 1 0.2167549 4 5 5 foo
# 2 1 -0.5424926 7 7 4 foo
# 3 1 0.8911446 3 9 1 foo
# 4 2 0.5959806 8 6 8 foo
# 5 2 1.6356180 6 8 3 foo
# 6 2 0.6892754 1 4 6 foo
# 7 3 -1.2812466 9 1 7 foo
# 8 3 -0.2131445 5 2 2 foo
# 9 3 1.8965399 2 3 9 foo
myDF %>% group_by(i) %>% summarize(across(where(is.numeric), median, .names="med_{col}"),
best_a = a[[which.min(j)]],
best_b = b[[which.min(j)]],
best_c = c[[which.min(j)]])
# # A tibble: 3 x 8
# i med_j med_a med_b med_c best_a best_b best_c
# * <int> <dbl> <int> <int> <int> <int> <int> <int>
# 1 1 0.217 4 7 4 7 7 4
# 2 2 0.689 6 6 6 8 6 8
# 3 3 -0.213 5 2 7 9 1 7
How can I define this second summarize
operation in a generic way (i.e., not manually as done above)?
Hence I would need something like this (which obviously does not work as j
is not recognized):
myfns = list(med = ~median(.),
best = ~.[[which.min(j)]])
myDF %>% group_by(i) %>% summarize(across(where(is.numeric), myfns, .names="{fn}_{col}"))
# Error: Problem with `summarise()` input `..1`.
# x object 'j' not found
# ℹ Input `..1` is `across(where(is.numeric), myfns, .names = "{fn}_{col}")`.
# ℹ The error occurred in group 1: i = 1.
Upvotes: 1
Views: 681
Reputation: 389265
Use another across
to get corresponding values in column a:c
where j
is minimum.
library(dplyr)
myDF %>%
group_by(i) %>%
summarize(across(where(is.numeric), median, .names="med_{col}"),
across(a:c, ~.[which.min(j)],.names = 'best_{col}'))
# i med_j med_a med_b med_c best_a best_b best_c
#* <int> <dbl> <int> <int> <int> <int> <int> <int>
#1 1 0.217 4 7 4 7 7 4
#2 2 0.689 6 6 6 8 6 8
#3 3 -0.213 5 2 7 9 1 7
To do it in the same across
statement :
myDF %>%
group_by(i) %>%
summarize(across(where(is.numeric), list(med = median,
best = ~.[which.min(j)]),
.names="{fn}_{col}"))
Upvotes: 1