aL3xa
aL3xa

Reputation: 36080

Aggregate data using each() with reshape2::dcast

I'm usually using reshape package to aggregate some data (d'uh), usually with plyr, because of its uber-awesome function each. Recently, I received a suggestion to switch to reshape2 and try it out, and now I can't seem to use each wizardry anymore.

reshape

> m <- melt(mtcars, id.vars = c("am", "vs"), measure.vars = "hp")
> cast(m, am + vs ~ variable, each(min, max, mean, sd))
  am vs hp_min hp_max   hp_mean    hp_sd
1  0  0    150    245 194.16667 33.35984
2  0  1     62    123 102.14286 20.93186
3  1  0     91    335 180.83333 98.81582
4  1  1     52    113  80.57143 24.14441

reshape2

require(plyr)
> m <- melt(mtcars, id.vars = c("am", "vs"), measure.vars = "hp")
> dcast(m, am + vs ~ variable, each(min, max, mean, sd))
Error in structure(ordered, dim = ns) : 
  dims [product 4] do not match the length of object [16]
In addition: Warning messages:
1: In fs[[i]](x, ...) : no non-missing arguments to min; returning Inf
2: In fs[[i]](x, ...) : no non-missing arguments to max; returning -Inf

I wasn't into mood to comb this down, as my previous code works like a charm with reshape, but I'd really like to know:

  1. is it possible to use each with dcast?
  2. is it advisable to use reshape2 at all? is reshape deprecated?

Upvotes: 4

Views: 1843

Answers (1)

joran
joran

Reputation: 173547

The answer to your first question appears to be no. Quoting from ?reshape2:::dcast:

If the combination of variables you supply does not uniquely identify one row in the original data set, you will need to supply an aggregating function, fun.aggregate. This function should take a vector of numbers and return a single summary statistic.

A look at Hadley's github page for reshape2 suggests that he knows this functionality was removed, but seems to think it's better done in plyr, presumably with something like:

ddply(m,.(am,vs),summarise,min = min(value),
                           max = max(value),
                           mean = mean(value),
                           sd = sd(value))

or if you really want to keep using each:

ddply(m,.(am,vs),function(x){each(min,max,mean,sd)(x$value)})

Upvotes: 5

Related Questions