David Robie
David Robie

Reputation: 435

R: dplyr filtering inside of a function with a NULL input

What is the correct way to perform an inline conditional check for a filter which ignores a NULL input argument?

I've recently been taught about the clean method for inline conditional filtering with dplyr. I'm now interested in applying that to a function where one or more inputs may be NULL. If the argument is provided, then you should filter based on that argument, but if it is null, you should not. In this case, data will just be iris %>% tibble(). In the past, I would do this in an unwieldy manner:

testfun <- function(data, range = NULL, spec = NULL){
  if(!is.null(range)) {
    data %<>% filter(between(Petal.Length, range[1], range[2]))
  }
  
  if(!is.null(spec)) {
    data %<>% filter(Species %in% spec)
  }
  
  return(data)
}

My attempt at inline conditional checks looks like this

testfun <- function(data, range = NULL, spec = NULL){
  data %>%
    filter(
      if(!is.null(range)) {between(Petal.Length, range[1], range[2])},
      if(!is.null(spec)) {Species %in% spec},
    )
}

This works as long as I provide inputs for range and spec. However, if I leave one of them null, I get an error message such as:

Error in 'filter()':

ℹ In argument: 'if (...) NULL'.

Caused by error:

! '..2' must be of size 150 or 1, not size 0.

Upvotes: 1

Views: 84

Answers (2)

David Robie
David Robie

Reputation: 435

User lroha commented on the post with what I believe to be the correct answer, but they have posted it as an answer.

You can't pass NULL to filter() so just add an ... else TRUE to your condition statements.

So, instead of:

testfun <- function(data, range = NULL, spec = NULL){
  data %>%
    filter(
      if(!is.null(range)) {between(Petal.Length, range[1], range[2])},
      if(!is.null(spec)) {Species %in% spec},
    )
}

It should be:

testfun <- function(data, range = NULL, spec = NULL){
  data %>%
    filter(
      if(!is.null(range)) {between(Petal.Length, range[1], range[2])} else TRUE,
      if(!is.null(spec)) {Species %in% spec} else TRUE,
    )
}

Upvotes: 0

G. Grothendieck
G. Grothendieck

Reputation: 269421

Set the defaults in the argument list to values that would cause the filter expressions to evaluate to TRUE

testfun2 <- function(data, range = c(-Inf, Inf), spec = data$Species) {
 data %>%
    filter(
      between(Petal.Length, range[1], range[2]),
      Species %in% spec
    )
}

or keep them as NULL in the argument list but then reset them in the code

testfun3 <- function(data, range = NULL, spec = NULL) {
  range <- range %||% c(Inf, Inf)
  spec <- spec %||% data$Species
  data %>%
    filter(
      between(Petal.Length, range[1], range[2]),
      Species %in% spec
    )
}

Another possibility is to incorporate the NULL check in the conditions

testfun4 <- function(data, range = NA, spec = NA) {
  data %>%
    filter(
      is.na(range) | between(Petal.Length, range[1], range[2]),
      is.na(spec) | Species %in% spec
    )
}

Upvotes: 2

Related Questions