EconomiCurtis
EconomiCurtis

Reputation: 2227

dplyr mutate, applying functions that incorporate variables (or objects, vectors) outside the data frame of interest

Problem

I am curious how I might apply a function to the mutate function from dplyr, which requires an argument that is a vector I define elsewhere.

I have an example that is somewhat abstracted from what I am actually trying to do (I am taking several columns, checking for the presence of values in those columns for that row that depend on the Date of the table, and then return a string classification), however for the sake of brevity the following example produces the same error, and hopefully will suffice.

Setup

DF <- data.frame(
  Index = 1:100, 
  Num1 = runif(100,0,100) %/% 1
  )

# array to check
CheckArray = seq(0,100, by = 2)

f <- function(x, ArrayToCheck){
  if (x %in% ArrayToCheck){
    return(T)
  } else {
    return(F)
  }
}

My attempt

DF <- dplyr::mutate(
  DF,
  Num1_even = f(Num1, CheckArray)
  )

which of course returns an error

Warning message: In if (x %in% ArrayToCheck) { : the condition has length > 1 and only the first element will be used*

Extra note

I should point out that I am aware my example could be solved other ways without a function, e.g.

dplyr::mutate(
  DF,
  Num1_even = Num1 %in% CheckArray
)

or

dplyr::mutate(
     DF,
     Num1_even = Num1 %in% seq(0,100, by = 2)
)

But in this case and many others, I often find it valuable to define a vector outside of my DF, and then apply a function with to each row along with multiple additional arguments.

I have also seen this solved via the apply family of functions, but I was hoping there was a method in dplyr, since it's so fast and has such nice syntax.


Perhaps we can get the Hadleyverse to add an operator that tells dplyr to step outside the scope of the current data.frame, e.g.

CheckArray = seq(0,100, by = 2)

DF <- dplyr::mutate(
  DF,
  Num1_even = f(Num1, %o%CheckArray%o%)
  )

Upvotes: 1

Views: 3033

Answers (1)

shadow
shadow

Reputation: 22313

This has nothing to do with the CheckArray vector. The problem is that if statements are not vectorized in R. You can use ifelse instead. Then your call should work. Check out ?ifelse for more information.

f <- function(x, ArrayToCheck){
  ifelse(x %in% ArrayToCheck, TRUE, FALSE)
}

dplyr::mutate(
  DF,
  Num1_even = f(Num1, CheckArray)
)

Of course in this case the ifelse is not actually needed either (see below). If your desired output consists only of TRUE and FALSE, you can skip the ifelse, but I added ifelse in case your actual example is more complicated than that.

f <- function(x, ArrayToCheck){
  x %in% ArrayToCheck
}

Upvotes: 2

Related Questions