deeenes
deeenes

Reputation: 4586

dplyr: specific summarise methods for certain columns, default for the rest

I would like to summarise a grouped dataframe. For certain columns I need a specific aggregate method, here below for example concatenating strings at d, for other columns I apply a default method (below first). I found a way to do this by separating the column with specific method into another dataframe. This is quite complicated, 5 lines of code, 3 groupings, and what if I have more columns like this, or some of them are type of double. I am wondering if there is an easier way to do this? For example it would be perfect if we could pass first as a default method for summarise_all and specific methods for only certain columns. I've read the docs and concluded that is not possible.

require(dplyr)

df <- data.frame(
    a = sort(rep(letters[1:4], 5)),
    b = rep(letters[6:7], 10),
    c = rnorm(20, 1000, 500),
    d = rep(c('h', 'h', 'i', 'h'), 5)
)

grp <- df %>% group_by(a, b, d) %>% summarise_all(first)
grp_d <- grp %>% group_by(a, b) %>% summarise(d = paste(d, collapse = ""))
grp_d$d <- factor(grp_d$d)
grp_othercols <- grp %>% group_by(a, b) %>% summarise_all(first)
merged <- bind_cols(grp_othercols %>% select(-d),
                    as.data.frame(grp_d['d']))

Upvotes: 0

Views: 680

Answers (2)

GGamba
GGamba

Reputation: 13680

We can also pass multiple function to summarize_all and then selecting only the columns we are interested in:

df %>% 
    group_by(a, b) %>% 
    arrange(a, b, d) %>%
    summarise_all(c('c', 'd'), funs(paste = paste(unique(.), collapse = ''), f = first)) %>% 
    select(-c_paste, -d_f)

Note the arrange() as we never group on d, it does not get sorted and first give slightly different results.

Upvotes: 1

Edwin
Edwin

Reputation: 3252

Thanks to Axeman's comment, just one grouping:

df %>% group_by(a, b, d) %>% summarise_all(first) %>% 
  mutate(d = factor(paste(d, collapse = ""))) %>% 
  summarise_all(first)

Upvotes: 2

Related Questions