Reputation: 58
I am trying to subset rows of a data.table programatically inside an R function. The following works as expected:
dt <- data.table(id = 1:5, variable = c("test","data","data", "is", "fun"))
dt[variable == "data"]
id variable
1: 2 data
2: 3 data
If I define the function:
dtSubset <- function(df, col, str) {
df[col == str]
}
dtSubset(df, "variable", "data")
I get a 0-row table.
The following works:
dtSubset <- function(df, str) {
dt[variable == str]
}
dtSubset(df, "data")
so the issue lies with selecting the column inside the function.
I tried combinations of eval
, substitute
, quote
and deparse
, quoting and unquoting the column name to be passed in, each to no avail. I also tried out subset
but ran into the same issues.
The vignettes describe how to do this in j
but not in i
. Not sure if I've missed something obvious or whether I'm just thinking wrong, but how should I be going about doing this?
Upvotes: 3
Views: 835
Reputation: 66819
I tried combinations of
eval
,substitute
,quote
anddeparse
, quoting and unquoting the column name to be passed in, each to no avail.
You can do
subit <- function(d, cc, vv){
ex = substitute( d[cc == vv], list(cc = as.name(cc), vv = vv) )
print(ex)
eval(ex)
}
subit(dt, "variable", "data")
d[variable == "data"]
id variable
1: 2 data
2: 3 data
as.name
or as.symbol
takes the quotes off "variable"
.
With this approach, you can take advantage of data.tables optimized "auto indexing". @sindri_baldur's answer also uses indices by creating one and joining. A third alternative would be on-the-fly joining:
jit <- function(d, cc, vv) d[.(unique(vv)), on=cc, nomatch=0]
jit(dt, "variable", "data")
Some alternatives for this "subsetting join" are here: Perform a semi-join with data.table
Upvotes: 2
Reputation: 33613
One more option is using setkey()
inside the function:
dtSubset <- function(df, col, str) {
setkeyv(df, col)[str]
}
dtSubset(dt, "variable", "data")
# id variable
# 1: 2 data
# 2: 3 data
Upvotes: 1
Reputation: 83275
You could also use get
to make your function work:
dtSubset <- function(df, col, str) {
df[get(col) == str]
}
Now dtSubset(dt, "variable", "data")
will get you the intended result:
id variable 1: 2 data 2: 3 data
Upvotes: 3
Reputation: 389275
If you want to pass quoted variable in the function you could subset columns using [[
dtSubset <- function(df, col, str) {
df[df[[col]] == str, ]
}
dtSubset(dt, "variable", "data")
# id variable
#1: 2 data
#2: 3 data
Upvotes: 2