RomanB
RomanB

Reputation: 180

Check expression argument of function

When writing functions it is important to check for the type of arguments. For example, take the following (not necessarily useful) function which is performing subsetting:

data_subset = function(data, date_col) {

      if (!TRUE %in% (is.character(date_col) | is.expression(date_col))){
        stop("Input variable date is of wrong format")
      }

      if (is.character(date_col)) {
        x <- match(date_col, names(data))
      } else  x <- match(deparse(substitute(date_col)), names(data))

        sub <- data[,x]
}

I would like to allow the user to provide the column which should be extracted as character or expression (e.g. a column called "date" vs. just date). At the beginning I would like to check that the input for date_col is really either a character value or an expression. However, 'is.expression' does not work:

Error in match(x, table, nomatch = 0L) : object '...' not found

Since deparse(substitute)) works if one provides expressions I thought 'is.expression' has to work as well. What is wrong here, can anyone give me a hint?

Upvotes: 2

Views: 153

Answers (1)

CL.
CL.

Reputation: 14997

I think you are not looking for is.expression but for is.name.

The tricky part is to get the type of date_col and to check if it is of type character only if it is not of type name. If you called is.character when it's a name, then it would get evaluated, typically resulting in an error because the object is not defined.

To do this, short circuit evaluation can be used: In

if(!(is.name(substitute(date_col)) || is.character(date_col)))

is.character is only called if is.name returns FALSE.

Your function boils down to:

data_subset = function(data, date_col) {

  if(!(is.name(substitute(date_col)) || is.character(date_col))) {
     stop("Input variable date is of wrong format") 
  }

  date_col2 <- as.character(substitute(date_col))
  return(data[, date_col2])
}

Of course, you could use if(is.name(…)) to convert only to character when date_col is a name.

This works:

testDF <- data.frame(col1 = rnorm(10), col2 = rnorm(10, mean = 10), col3 = rnorm(10, mean = 50), rnorm(10, mean = 100))

data_subset(testDF, "col1") # ok
data_subset(testDF, col1) # ok
data_subset(testDF, 1) # Error in data_subset(testDF, 1) : Input variable date is of wrong format

However, I don't think you should do this. Consider the following example:

var <- "col1"
data_subset(testDF, var) #  Error in `[.data.frame`(data, , date_col2) : undefined columns selected

col1 <- "col2"
data_subset(testDF, col1) # Gives content of column 1, not column 2.

Though this "works as designed", it is confusing because unless carefully reading your function's documentation one would expect to get col1 in the first case and col2 in the second case.

Abusing a famous quote:

Some people, when confronted with a problem, think “I know, I'll use non-standard evaluation.” Now they have two problems.

Hadley Wickham in Non-standard evaluation:

Non-standard evaluation allows you to write functions that are extremely powerful. However, they are harder to understand and to program with. As well as always providing an escape hatch, carefully consider both the costs and benefits of NSE before using it in a new domain.

Unless you expect large benefits from allowing to skip the quotes around the name of the column, don't do it.

Upvotes: 1

Related Questions