Daryl McCullough
Daryl McCullough

Reputation: 301

R data tables accessing columns by name

If I have a data table, foo, in R with a column named "date", I can get the vector of date values by the notation

foo[, date]

(Unlike data frames, date doesn't need to be in quotes).

How can this be done programmatically? That is, if I have a variable x whose value is the string "date", then how to I access the column of foo with that name?

Something that sort of works is to create a symbol:

sym <- as.name(x)
v <- foo[, eval(sym)]

...

As I say, that sort of works, but there is something not quite right about it. If that code is inside a function myFun in package myPackage, then it seems that it doesn't work if I explicitly use the package through:

myPackage::myFun(...)

I get an error message saying "undefined columns selected".

[edited] Some more details

Suppose I create a package called myPackage. This package has a single file with the following in it:

library(data.table)
#' export
myFun <- function(table1) {
    names1 <- names(table1)
    name1 <- names1[[1]]
    sym <- as.Name(name1)
    table1[, eval(sym)]
}

If I load that function using R Studio, then

myFun(tbl)

returns the first column of the data table tbl.

On the other hand, if I call

myPackage::myFun(tbl)

it doesn't work. It complains about

Error in .subset(x, j) : invalid subscript type 'builtin'

I'm just curious as to why myPackage:: would make this difference.

Upvotes: 1

Views: 3638

Answers (2)

geneorama
geneorama

Reputation: 3720

I think the problem is that you've defined myFun in your global environment, so it only appeared to work.

I changed as.Name to as.name, and created a package with the following functions:

library(data.table)
myFun <- function(table1) {
    names1 <- names(table1)
    name1 <- names1[[1]]
    sym <- as.name(name1)
    table1[, eval(sym)]
}
myFun_mod <- function(dt) {
    # dt[, eval(as.name(colnames(dt)[1]))]
    dt[[colnames(dt)[1]]]
}

Then, I tested it using this:

library(data.table)
myDt <- data.table(a=letters[1:3],b=1:3)
myFun(myDt)
myFun_mod(myDt)

myFun didn't work myFun_mod did work

The output:

> library(test)
> myFun(myDt)
Error in eval(expr, envir, enclos) : object 'a' not found
> myFun_mod(myDt)
[1] "a" "b" "c"

then I added the following line to the NAMESPACE file: import(data.table)

This is what @mnel was talking about with this link: Using data.table package inside my own package

After adding import(data.table), both functions work.

I'm still not sure why you got the particular .subset error, which is why I went though the effort of reproducing the result...

Upvotes: 1

IRTFM
IRTFM

Reputation: 263479

A quick way which points to a longer way is this:

subset(foo, TRUE, date)

The subset function accepts unquoted symbol/names for its 'subset' and 'select' arguments. (Its author, however, thinks this was a bad idea and suggests we use formulas instead.) This was the jumping off place for sections of Hadley Wickham's Advanced Programming webpages (and book).: http://adv-r.had.co.nz/Computing-on-the-language.html and http://adv-r.had.co.nz/Functional-programming.html . You can also look at the code for subset.data.frame:

> subset.data.frame
function (x, subset, select, drop = FALSE, ...) 
{
    r <- if (missing(subset)) 
        rep_len(TRUE, nrow(x))
    else {
        e <- substitute(subset)
        r <- eval(e, x, parent.frame())
        if (!is.logical(r)) 
            stop("'subset' must be logical")
        r & !is.na(r)
    }
    vars <- if (missing(select)) 
        TRUE
    else {
        nl <- as.list(seq_along(x))
        names(nl) <- names(x)
        eval(substitute(select), nl, parent.frame())
    }
    x[r, vars, drop = drop]
}

The problem with the use of "naked" expressions that get passed into functions is that their evaluation frame is sometimes not what is expected. R formulas, like other functions, carry a pointer to the environment in which they were defined.

Upvotes: 1

Related Questions