overdisperse
overdisperse

Reputation: 426

Subscript out of bounds when using conditional by in data.table

I want to list unique IDs within groups, where the grouping variable can be selected by the user. The following works:

if(useGroupVar1){

  dt[,unique(id),.(group1a,group1b,group1c)]

} else {

  dt[,unique(id),group2]

}

The expressions I'm using in my code to filter rows are actually fairly long so I want to avoid duplicating code. I came up with this "solution", which doesn't actually work:

dt[,unique(id),if(useGroupVar1){.(group1a,group1b,group1c)}else{group2}]

If the condition leads to using group2 alone, it works (though the column is called if), but trying to get it to use .(group1a,group1b,group1c) results in

Error in eval(expr, envir, enclos) : could not find function "."

Now, I read .() is an alias to list(), so using the latter gets me this

Error in bysubl[[jj + 1L]] : subscript out of bounds

Is there a way to implement a conditional by without duplicating entire expressions?

Upvotes: 0

Views: 407

Answers (1)

talat
talat

Reputation: 70246

Just personal preference, but I don't like pasting strings in a by= statement of a data.table (not very readable to me).

Instead, I would use a user-selected variable (var) and create a list of grouping variables. Then, you can easily select the variables like so:

groupVars <- list(
  GroupVar1 = c("group1a","group1b","group1c"),
  GroupVar2 = c("groupXYZ", "groupABC"),
  GroupVarX = "group2"
)

# user selects that - for example - var = "GroupVar2"

dt[, unique(id), by = groupVars[[var]]]

As a side note:

You can easily extend this kind of variable selection for situations where a user is allowed to select multiple sets of grouping variables. In such cases, you could it as follows:

Assume, that the user-selected variable is now:

var <- c("GroupVar1", "GroupVarX") # two groups selected

Then, the by= statement becomes:

dt[, unique(id), by = unlist(groupVars[var], use.names=FALSE)]

Upvotes: 5

Related Questions