Mert Nuhoglu
Mert Nuhoglu

Reputation: 10133

Aggregate strings using c() in dplyr summarize or aggregate

I want to aggregate some strings using c() as aggregation function in dplyr. I first tried the following:

> InsectSprays$spray = as.character(InsectSprays$spray)
> dt = tbl_df(InsectSprays)
> dt %>% group_by(count) %>% summarize(c(spray))
Error: expecting a single value

But using c() function in aggregate() works:

> da = aggregate(spray ~ count, InsectSprays, c)
> head(da)
  count                  spray
1     0                   C, C
2     1       C, C, C, C, E, E
3     2             C, C, D, E>

Searching in stackoverflow hinted that instead of c() function, using paste() with collapse would solve the problem:

dt %>% group_by(count) %>% summarize(s=paste(spray, collapse=","))

or

dt %>% group_by(count) %>% summarize(paste( c(spray), collapse=","))

My question is: Why does c() function work in aggregate() but not in dplyr summarize()?

Upvotes: 5

Views: 5108

Answers (1)

Rich Scriven
Rich Scriven

Reputation: 99331

If you have a closer look, you can find that c() actually does work (to a certain extent) when we use do(). But to my understanding, dplyr does not currently allow this type of printing of lists

> InsectSprays$spray = as.character(InsectSprays$spray)
> dt = tbl_df(InsectSprays)
> doC <- dt %>% group_by(count) %>% do(s = c(.$spray))
> head(doC)
Source: local data frame [6 x 2]

  count        s
1     0 <chr[2]>
2     1 <chr[6]>
3     2 <chr[4]>
4     3 <chr[8]>
5     4 <chr[4]>
6     5 <chr[7]>

> head(doC)[[2]]
[[1]]
[1] "C" "C"

[[2]]
[1] "C" "C" "C" "C" "E" "E"

[[3]]
[1] "C" "C" "D" "E"

[[4]]
[1] "C" "C" "D" "D" "E" "E" "E" "E"

[[5]]
[1] "C" "D" "D" "E"

[[6]]
[1] "D" "D" "D" "D" "D" "E" "E"

Upvotes: 5

Related Questions