dplyr: specific summarise methods for certain columns, default for the rest

Question

I would like to summarise a grouped dataframe. For certain columns I need a specific aggregate method, here below for example concatenating strings at d, for other columns I apply a default method (below first). I found a way to do this by separating the column with specific method into another dataframe. This is quite complicated, 5 lines of code, 3 groupings, and what if I have more columns like this, or some of them are type of double. I am wondering if there is an easier way to do this? For example it would be perfect if we could pass first as a default method for summarise_all and specific methods for only certain columns. I've read the docs and concluded that is not possible.

require(dplyr)

df <- data.frame(
    a = sort(rep(letters[1:4], 5)),
    b = rep(letters[6:7], 10),
    c = rnorm(20, 1000, 500),
    d = rep(c('h', 'h', 'i', 'h'), 5)
)

grp <- df %>% group_by(a, b, d) %>% summarise_all(first)
grp_d <- grp %>% group_by(a, b) %>% summarise(d = paste(d, collapse = ""))
grp_d$d <- factor(grp_d$d)
grp_othercols <- grp %>% group_by(a, b) %>% summarise_all(first)
merged <- bind_cols(grp_othercols %>% select(-d),
                    as.data.frame(grp_d['d']))

Edwin · Accepted Answer

Thanks to Axeman's comment, just one grouping:

df %>% group_by(a, b, d) %>% summarise_all(first) %>% 
  mutate(d = factor(paste(d, collapse = ""))) %>% 
  summarise_all(first)

dplyr: specific summarise methods for certain columns, default for the rest

Answers (2)

Related Questions