Reputation: 2293
I have functions that operate on a single vector (for example, a column in a data frame). I want users to be able to use $
to specify the columns that they pass to these functions; for example, I want them to be able to write myFun(df$x)
, where df
is a data frame. But in such cases, I want my functions to detect when x
isn't in df
. How may I do this?
Here is a minimal illustration of the problem:
myFun <- function (x) sum(x)
data(iris)
myFun(iris$Petal.Width) # returns 180
myFun(iris$XXX) # returns 0
I don't want the last line to return 0. I want it to throw an error message, as XXX
isn't a column in iris
. How may I do this?
One way is to run as.character(match.call())
inside the function. I could then use the parts of the resulting string to determine the name of df
, and in turn, I could check for the existence of x
. But this seems like a not–so–robust solution.
It won't suffice to throw an error whenever x
has length 0: I want to detect whether the vector exists, not whether it has length 0.
I searched for related posts on Stack Overflow, but I didn't find any.
Upvotes: 1
Views: 54
Reputation: 886938
The iris$XXX
returns NULL
and NULL is passed to sum
sum(NULL)
#[1] 0
Note that either iris$XXX
or iris[['XXX']]
returns NULL
as value. If we need to get an error either subset
or dplyr::select
gives that
iris %>%
select(XXX)
Error: Can't subset columns that don't exist. ✖ Column
XXX
doesn't exist. Runrlang::last_error()
to see where the error occurred.
Or with pull
iris %>%
pull(XXX)
Error: object 'XXX' not found Run
rlang::last_error()
to see where the error occurred.
subset(iris, select = XXX)
Error in eval(substitute(select), nl, parent.frame()) : object 'XXX' not found >
We could make the function to return an error if NULL
is passed. Based on the way the function takes arguments, it is taking the value and not any info about the object.
myFun <- function (x) {
stopifnot(!is.null(x))
sum(x)
}
However, this would be non-specific error because NULL
values can be passed to the function from other cases as well i.e. consider if the column exists and the value is NULL
.
If we need to check if the column is valid, then the data and the column name should be passed into
myFun2 <- function(data, colnm) {
stopifnot(exists(colnm, data))
sum(data[[colnm]])
}
myFun2(iris, 'XXX')
#Error in myFun2(iris, "XXX") : exists(colnm, data) is not TRUE
Upvotes: 2