user697473
user697473

Reputation: 2293

make function detect nonexistent column when specified as df$x

I have functions that operate on a single vector (for example, a column in a data frame). I want users to be able to use $ to specify the columns that they pass to these functions; for example, I want them to be able to write myFun(df$x), where df is a data frame. But in such cases, I want my functions to detect when x isn't in df. How may I do this?

Here is a minimal illustration of the problem:

myFun <- function (x) sum(x)
data(iris)
myFun(iris$Petal.Width)  # returns 180
myFun(iris$XXX)          # returns 0

I don't want the last line to return 0. I want it to throw an error message, as XXX isn't a column in iris. How may I do this?

One way is to run as.character(match.call()) inside the function. I could then use the parts of the resulting string to determine the name of df, and in turn, I could check for the existence of x. But this seems like a not–so–robust solution.

It won't suffice to throw an error whenever x has length 0: I want to detect whether the vector exists, not whether it has length 0.

I searched for related posts on Stack Overflow, but I didn't find any.

Upvotes: 1

Views: 54

Answers (1)

akrun
akrun

Reputation: 886938

The iris$XXX returns NULL and NULL is passed to sum

sum(NULL)
#[1] 0

Note that either iris$XXX or iris[['XXX']] returns NULL as value. If we need to get an error either subset or dplyr::select gives that

iris %>% 
   select(XXX)

Error: Can't subset columns that don't exist. ✖ Column XXX doesn't exist. Run rlang::last_error() to see where the error occurred.

Or with pull

 iris %>% 
     pull(XXX)

Error: object 'XXX' not found Run rlang::last_error() to see where the error occurred.

subset(iris, select = XXX)   

Error in eval(substitute(select), nl, parent.frame()) : object 'XXX' not found >

We could make the function to return an error if NULL is passed. Based on the way the function takes arguments, it is taking the value and not any info about the object.

myFun <- function (x) {
      stopifnot(!is.null(x))
       sum(x)
    }

However, this would be non-specific error because NULL values can be passed to the function from other cases as well i.e. consider if the column exists and the value is NULL.

If we need to check if the column is valid, then the data and the column name should be passed into

myFun2 <- function(data, colnm) {
         stopifnot(exists(colnm, data))
         sum(data[[colnm]])
  }

myFun2(iris, 'XXX')
#Error in myFun2(iris, "XXX") : exists(colnm, data) is not TRUE

Upvotes: 2

Related Questions