Andreas
Andreas

Reputation: 1953

Applying the Data Table Join Operator to a List of Data Tables

I'm curious if it is possible to apply the [ or join function to a list of data.tables. I can get the function to work on each element of the list, but I receive an error when applying the function to an entire list using lapply.

### Require data.table
require(data.table)

### Create master data.table
data <- data.table(id = letters[1:10], val = 1:10, key = 'id')

### Create data tables to be joined
a <- data.table(id = letters[1:10], height = rnorm(n = 10, mean = 150, sd = 10), key = 'id')
b <- data.table(id = letters[1:10], weight = rnorm(n = 10, mean = 140, sd = 20), key = 'id')

### Create a list of data tables to be joined
l <- list(a, b)

### Join data tables (Works)
`[`(l[[1]], data)
`[`(l[[2]], data)

### Apply join function to a list. Doesn't work. Why?
lapply(l, `[`, data)
Error in `[.default`(x, i) : invalid subscript type 'list'

This error makes me wonder how R is able to distinguish when [ is used for a join, versus when it is used to extract elements from an object. For example:

### Extract first column from each data.table in 'l'
lapply(l, `[`, 1)

Upvotes: 1

Views: 216

Answers (1)

Hong Ooi
Hong Ooi

Reputation: 57696

Per ?lapply:

For historical reasons, the calls created by lapply are unevaluated, and code has been written (e.g. bquote) that relies on this. This means that the recorded call is always of the form FUN(X[[0L]], ...), with 0L replaced by the current integer index. This is not normally a problem, but it can be if FUN uses sys.call or match.call or if it is a primitive function that makes use of the call. This means that it is often safer to call primitive functions with a wrapper, so that e.g. lapply(ll, function(x) is.numeric(x)) is required in R 2.7.1 to ensure that method dispatch for is.numeric occurs correctly.

You need to wrap [ in a function, thus:

lapply(l, function(d) `[`(d, data))

Upvotes: 4

Related Questions