Reputation: 4960
I have a dataframe containing groups that I want to collapse into single entries in such a way that the values of the new rows are either the minimum or maximum values within each group, depending on the column name.
For example, given:
set.seed(1)
dat <- data.frame(grp = c('A', 'A', 'B', 'B'),
v1_high = rnorm(4),
v2_high = rnorm(4),
v3_low = rnorm(4))
# original
grp v1_high v2_high v3_low
1 A -0.6264538 0.3295078 0.5757814
2 A 0.1836433 -0.8204684 -0.3053884
3 B -0.8356286 0.4874291 1.5117812
4 B 1.5952808 0.7383247 0.3898432
I would like to generate a new dataframe with two entries corresponding to the groups "A" and "B" with the maximum values of each column ending in _high
and the minimum values of each column ending in _low
.
In this case:
# desired result
grp v1_high v2_high v3_low
1 A 0.1836433 0.3295078 -0.3053884
2 B 1.5952808 0.7383247 0.3898432
Finally, the number and names of such columns are not known beforehand.
Ideally, the summarization would also handle the case where there are no _min
columns present, and only _max
ones, although I can always check for this scenario manually and handle the two cases in separately using an if () { .. }
statement.
Any suggestions? It seems like this should be possible using the summarize_at() function, but I haven't figured out how to apply different functions to different sets of columns.
Upvotes: 2
Views: 2574
Reputation: 887088
An option would be to group by 'grp', get the max
of columns that end with 'high' (column names), use that also as grouping column and get the min
of columns that end with 'low'
library(dplyr)
dat %>%
group_by(grp) %>%
mutate_at(vars(ends_with('high')), max) %>%
group_by_at(vars(ends_with('high')), .add = TRUE) %>%
summarise_at(vars(ends_with('low')), min)
# A tibble: 2 x 4
# Groups: grp, v1_high [2]
# grp v1_high v2_high v3_low
# <fct> <dbl> <dbl> <dbl>
#1 A 0.184 0.330 -0.305
#2 B 1.60 0.738 0.390
It would also work if there are no min
columns
dat[-4] %>%
group_by(grp) %>%
mutate_at(vars(ends_with('high')), max) %>%
group_by_at(vars(ends_with('high')), .add = TRUE) %>%
summarise_at(vars(ends_with('low')), min)
# A tibble: 2 x 3
# Groups: grp, v1_high [2]
# grp v1_high v2_high
# <fct> <dbl> <dbl>
#1 A 0.184 0.330
#2 B 1.60 0.738
Or another option is map2
library(purrr)
map2(list(min, max), list('low', 'high'), ~
dat %>%
select(grp, ends_with(.y)) %>%
group_by(grp) %>%
summarise_all(.x)) %>%
reduce(inner_join, by = 'grp')
Upvotes: 1