Why does dcast not accept x[length(x)]?

I've been struggling to get dcast to aggregate by taking the last element. Here's an example:

x <- data.table::data.table(foo = "bar", value = c(1, 0))
x

#    foo value
# 1: bar     1
# 2: bar     0
data.table::dcast(x, ... ~ foo, fun.aggregate = function(x) x[length(x)])

# Error: Aggregating function(s) should take vector inputs and return a single value (length=1).
# However, function(s) returns length!=1. This value will have to be used to fill any missing
# combinations, and therefore must be length=1. Either override by setting the 'fill' argument
# explicitly or modify your function to handle this case appropriately.

This also happens with the reshape2 version of dcast, and if using a data.frame instead of a data.table.

There are ways I can get this to work. For example, I can use

data.table::dcast(x, ... ~ foo, fun.aggregate = function(x) rev(x)[1L])

#    . bar
# 1: .   0

and get the expected result. The dplyr::last() function also works, data.table::last() does not.

However, what I'm interested is in why using x[length(x)] doesn't work. If I put intermediate print commands in the aggregation function to work out what's happening, I get the following:

data.table::dcast(x, ... ~ foo,
                  fun.aggregate = function(x) {print(x); print(length(x)); 5L}, value.var = "value")

# numeric(0)
# [1] 0
# [1] 1 0
# [1] 2
#    . bar
# 1: .   5

This suggests that dcast is iterating over a value of foo that is not in the table, and can't exist elsewhere since foo is a simple character vector, not a factor vector. What's happening?

R version: 3.6.0 data.table version: 1.12.2

Upvotes: 2

Views: 277

Answers (1)

Mikko Marttila
Mikko Marttila

Reputation: 11898

It seems that both data.table::dcast.data.table() and reshape2::dcast() expect the aggregating function to return a length 1 value for length 0 input. Both functions try to get a "default value" to use by calling the aggregating function with a length 0 argument.

The relevant part of the data.table code is here and looks like this:

fill.default = suppressWarnings(dat[0L][, eval(fun.call)])
if (nrow(fill.default) != 1L) stop(errmsg, call.=FALSE)

reshape2 calls vaggregate() from plyr, that has a similar part here:

.default <- .fun(.value[0], ...)

So in the case of x[length(x)] the default value that both functions obtain is essentially:

last <- function(x) x[length(x)]
last(numeric())
#> numeric(0)

That is, a length 0 vector. But both functions require the default value to have length 1, thus the error.

Finally, dplyr::last() works because it returns NA for a length 0 input:

dplyr::last(numeric())
#> [1] NA

Upvotes: 4

Related Questions