Michael Lydeamore
Michael Lydeamore

Reputation: 58

Subsetting rows inside function in data.table

I am trying to subset rows of a data.table programatically inside an R function. The following works as expected:

dt <- data.table(id = 1:5, variable = c("test","data","data", "is", "fun"))
dt[variable == "data"]
   id variable
1:  2     data
2:  3     data

If I define the function:

dtSubset <- function(df, col, str) {
  df[col == str]
}
dtSubset(df, "variable", "data")

I get a 0-row table.

The following works:

dtSubset <- function(df, str) {
  dt[variable == str]
}
dtSubset(df, "data")

so the issue lies with selecting the column inside the function.

I tried combinations of eval, substitute, quote and deparse, quoting and unquoting the column name to be passed in, each to no avail. I also tried out subset but ran into the same issues. The vignettes describe how to do this in j but not in i. Not sure if I've missed something obvious or whether I'm just thinking wrong, but how should I be going about doing this?

Upvotes: 3

Views: 835

Answers (4)

Frank
Frank

Reputation: 66819

I tried combinations of eval, substitute, quote and deparse, quoting and unquoting the column name to be passed in, each to no avail.

You can do

subit <- function(d, cc, vv){
  ex = substitute( d[cc == vv], list(cc = as.name(cc), vv = vv) )
  print(ex)
  eval(ex)
}

subit(dt, "variable", "data")

d[variable == "data"]
   id variable
1:  2     data
2:  3     data

as.name or as.symbol takes the quotes off "variable".

With this approach, you can take advantage of data.tables optimized "auto indexing". @sindri_baldur's answer also uses indices by creating one and joining. A third alternative would be on-the-fly joining:

jit <- function(d, cc, vv) d[.(unique(vv)), on=cc, nomatch=0]
jit(dt, "variable", "data")

Some alternatives for this "subsetting join" are here: Perform a semi-join with data.table

Upvotes: 2

s_baldur
s_baldur

Reputation: 33613

One more option is using setkey() inside the function:

dtSubset <- function(df, col, str) {
  setkeyv(df, col)[str]
}

dtSubset(dt, "variable", "data")
#    id variable
# 1:  2     data
# 2:  3     data

Upvotes: 1

Jaap
Jaap

Reputation: 83275

You could also use get to make your function work:

dtSubset <- function(df, col, str) {
  df[get(col) == str]
}

Now dtSubset(dt, "variable", "data") will get you the intended result:

   id variable
1:  2     data
2:  3     data

Upvotes: 3

Ronak Shah
Ronak Shah

Reputation: 389275

If you want to pass quoted variable in the function you could subset columns using [[

dtSubset <- function(df, col, str) {
     df[df[[col]] == str, ]
}

dtSubset(dt, "variable", "data")

#   id variable
#1:  2     data
#2:  3     data

Upvotes: 2

Related Questions