Reputation: 4586
I would like to summarise a grouped dataframe. For certain columns I need a specific aggregate method, here below for example concatenating strings at d
, for other columns I apply a default method (below first
). I found a way to do this by separating the column with specific method into another dataframe. This is quite complicated, 5 lines of code, 3 groupings, and what if I have more columns like this, or some of them are type of double. I am wondering if there is an easier way to do this? For example it would be perfect if we could pass first
as a default method for summarise_all
and specific methods for only certain columns. I've read the docs and concluded that is not possible.
require(dplyr)
df <- data.frame(
a = sort(rep(letters[1:4], 5)),
b = rep(letters[6:7], 10),
c = rnorm(20, 1000, 500),
d = rep(c('h', 'h', 'i', 'h'), 5)
)
grp <- df %>% group_by(a, b, d) %>% summarise_all(first)
grp_d <- grp %>% group_by(a, b) %>% summarise(d = paste(d, collapse = ""))
grp_d$d <- factor(grp_d$d)
grp_othercols <- grp %>% group_by(a, b) %>% summarise_all(first)
merged <- bind_cols(grp_othercols %>% select(-d),
as.data.frame(grp_d['d']))
Upvotes: 0
Views: 680
Reputation: 13680
We can also pass multiple function to summarize_all
and then selecting only the columns we are interested in:
df %>%
group_by(a, b) %>%
arrange(a, b, d) %>%
summarise_all(c('c', 'd'), funs(paste = paste(unique(.), collapse = ''), f = first)) %>%
select(-c_paste, -d_f)
Note the arrange()
as we never group on d
, it does not get sorted and first
give slightly different results.
Upvotes: 1
Reputation: 3252
Thanks to Axeman's comment, just one grouping:
df %>% group_by(a, b, d) %>% summarise_all(first) %>%
mutate(d = factor(paste(d, collapse = ""))) %>%
summarise_all(first)
Upvotes: 2