Reputation: 360
I've been struggling to get dcast
to aggregate by taking the last element. Here's an example:
x <- data.table::data.table(foo = "bar", value = c(1, 0))
x
# foo value
# 1: bar 1
# 2: bar 0
data.table::dcast(x, ... ~ foo, fun.aggregate = function(x) x[length(x)])
# Error: Aggregating function(s) should take vector inputs and return a single value (length=1).
# However, function(s) returns length!=1. This value will have to be used to fill any missing
# combinations, and therefore must be length=1. Either override by setting the 'fill' argument
# explicitly or modify your function to handle this case appropriately.
This also happens with the reshape2
version of dcast
, and if using a data.frame
instead of a data.table
.
There are ways I can get this to work. For example, I can use
data.table::dcast(x, ... ~ foo, fun.aggregate = function(x) rev(x)[1L])
# . bar
# 1: . 0
and get the expected result. The dplyr::last()
function also works, data.table::last()
does not.
However, what I'm interested is in why using x[length(x)]
doesn't work. If I put intermediate print commands in the aggregation function to work out what's happening, I get the following:
data.table::dcast(x, ... ~ foo,
fun.aggregate = function(x) {print(x); print(length(x)); 5L}, value.var = "value")
# numeric(0)
# [1] 0
# [1] 1 0
# [1] 2
# . bar
# 1: . 5
This suggests that dcast
is iterating over a value of foo
that is not in the table, and can't exist elsewhere since foo
is a simple character vector, not a factor vector. What's happening?
R
version: 3.6.0
data.table
version: 1.12.2
Upvotes: 2
Views: 277
Reputation: 11898
It seems that both data.table::dcast.data.table()
and reshape2::dcast()
expect the aggregating function to return a length 1 value for length 0 input. Both functions try to get a "default value" to use by calling the aggregating function with a length 0 argument.
The relevant part of the data.table code is here and looks like this:
fill.default = suppressWarnings(dat[0L][, eval(fun.call)])
if (nrow(fill.default) != 1L) stop(errmsg, call.=FALSE)
reshape2 calls vaggregate()
from plyr, that has a similar part here:
.default <- .fun(.value[0], ...)
So in the case of x[length(x)]
the default value that both functions obtain is essentially:
last <- function(x) x[length(x)]
last(numeric())
#> numeric(0)
That is, a length 0 vector. But both functions require the default value to have length 1, thus the error.
Finally, dplyr::last()
works because it returns NA
for a length 0 input:
dplyr::last(numeric())
#> [1] NA
Upvotes: 4