jokroese
jokroese

Reputation: 45

How do I return a data-variable in an R function?

What I am trying to do

I am trying to write a function that returns the names of certain variables of a dataset. For a test tibble test <- tibble(x1 = 1:3, x2=2:4, x3=3:5, x4=4:6), I want a function

assign_predictors_argument <- function(dataset, outcome, predictors) {
  ...
}

such that:

  1. if the argument predictors is not defined, predictors will be set to all variables in dataset apart from outcome. E.g. assign_predictors_argument(test, x1) will return c(x2, x3, x4).
  2. if the argument predictors is defined, will return that value. E.g. assign_predictors_argument(test, x1, c(x2, x3)) will return c(x2, x3).

What I have tried

assign_predictors_argument <- function(dataset, outcome, predictors) {
  if(missing(predictors)) {
    predictors <- dataset %>%
      dplyr::select( -{{ outcome }} ) %>%
      names()
  }
  predictors
}

What went wrong

Case 1: predictors argument missing

assign_predictors_argument(test, x1) gives the result "x2" "x3" "x4". However, I want this to return c(x2,x3, x4).

How do I convert this character vector to a form like the input?

Case 2: predictors argument defined

assign_predictors_argument(test, x1, c(x2, x3)) gives

Error in assign_predictors_argument(test, x1, x2) : 
  object 'x2' not found

It appears that the last line of the function tries to evaluate and return predictors. As x3 is not defined in the environment, this brings an error.

I have tried a) changing the final line to {{predictors}} as well as b) changing missing(predictors) to is.null(predictors) and putting in a default predictors = NULL (following this). Neither have worked.

How can I return the value of predictors without either a) changing its form or b) evaluating it?

Upvotes: 3

Views: 517

Answers (2)

Artem Sokolov
Artem Sokolov

Reputation: 13691

You were close:

assign_predictors_argument <- function(dataset, outcome, predictors) {
  if(missing(predictors)) {
    dataset %>%
      dplyr::select( -{{ outcome }} ) %>%
      names() %>%
      {rlang::expr( c(!!!syms(.)) )}
  }
  else rlang::enexpr(predictors)
}

assign_predictors_argument(test, x1)
# c(x2, x3, x4)
assign_predictors_argument(test, x1, c(x2, x3))
# c(x2, x3)

In the above, rlang::expr() constructs the expression that you want by 1) converting names to symbols with syms() and 2) splicing them together inside the c(...) expression with the unquote-splice operator !!!.

For the second portion, you can simply capture the expression supplied by the user with rlang::enexpr().

Upvotes: 2

Allan Cameron
Allan Cameron

Reputation: 173793

You say you want to return something like c(x2, x3, x4). Let's first be clear what this object is. It is an unevaluated call to the function c. It is not a vector of names. You will be able to use it in tidy evaluation, but it will require the !! operator.

This is quite tricky to achieve. You need to capture the predictors argument and ensure it is either a single variable name or a call to c. Any other expression passed to predictors should probably throw an error.

If predictors is missing and you are getting the column names as characters, then you must convert these to names with as.name and stick them in a c call. If predictors is a single variable, it must be returned unevaluated. If it is a c call, it should also be returned unevaluated. Otherwise an error is thrown.

So the function might look something like this:

assign_predictors_argument <- function(dataset, outcome, predictors) {
  if(missing(predictors)) {
    predictors <- dataset %>%
      dplyr::select( -{{ outcome }} ) %>%
      names() %>%
      sapply(as.name, USE.NAMES = FALSE)
      predictors <- as.call(c(quote(c), predictors))
  } else {
   predictors <- as.list(match.call())$predictors
   if(is.call(predictors))
   {
     f_name <- as.list(predictors)[[1]]
     if(as.character(substitute(f_name)) != "c")
       stop("'predictors' must be either a single variable or vector of names")
   }
  }
  predictors
}

So let's test it out:

test <- dplyr::tibble(x1 = 1:3, x2 = 2:4, x3 = 3:5, x4 = 4:6)

# Test with missing predictors
assign_predictors_argument(test, x1)
#> c(x2, x3, x4)

# Test with single predictor
assign_predictors_argument(test, x1, x2)
#> x2

# Test with multiple predictors
assign_predictors_argument(test, x1, c(x3, x4))
#> c(x3, x4)

# Test with call other than call to c
assign_predictors_argument(test, x1, as.name("x3"))
#> Error in assign_predictors_argument(test, x1, as.name("x3")): 
#>  'predictors' must be either a single variable or vector of names

This all looks correct. So to use it, we might do something like this:

vars <- assign_predictors_argument(test, x1, c(x2, x4))

vars
#> c(x2, x4)

test %>% select(!!vars)
#> # A tibble: 3 x 2
#>      x2    x4
#>   <int> <int>
#> 1     2     4
#> 2     3     5
#> 3     4     6

Created on 2020-07-10 by the reprex package (v0.3.0)

Upvotes: 1

Related Questions