Liem Binh
Liem Binh

Reputation: 1

Function does not to work with lubridate/mutate/across but works with a loop

I try to fix dates (years) using a function

change_century <- function(x){
  a <- year(x)
  ifelse(test = a >2020,yes = year(x) <- (year(x)-100),no = year(x) <- a)
  return(x)
}

The function works for specific row or using a loop for one column (here date of birth)

for (i in c(1:nrow(Df))){
  Df_recode$DOB[i] <- change_century(Df$DOB[i])
}

Then I try to use mutate/across

Df_recode <- Df %>% mutate(across(list_variable_date,~change_century(.)))

It does not work. Is there something I am getting wrong? thank you !

Upvotes: 0

Views: 139

Answers (1)

r2evans
r2evans

Reputation: 160607

Try:

change_century <- function(x){
  a <- year(x)
  newx <- ifelse(test = a > 2020, yes = a - 100, no = a)
  return(newx)
}

(Frankly, the use of newx as a temporary storage and then returning it was done that way solely to introduce minimal changes in your code. In general, in this case one does not need return, in fact theoretically it adds an unnecessary function to the evaluation stack. I would tend to have two lines in that function: a <- year(x) and ifelse(..), without assignment. The default behavior in R is to return the value of the last expression, which in my case would be the results of ifelse, which is what we want. Assigning it to newx and then return(newx) or even just newx as the last expression has exactly the same effect.)

Rationale

ifelse cannot have variable assignment within it. That's not to say that is is a syntax error (it is not), but that it is counter to its intent. You are asking the function to go through each condition found in test=, and return a value based on it. Regardless of the condition, both yes= and no= are evaluated completely, and then ifelse joins them together as needed.

For demonstration,

ifelse(test = c(TRUE, FALSE, TRUE), yes = 1:3, no = 11:13)

The return value is something like:

c(
  if (test[1]) yes[1] else no[1],
  if (test[2]) yes[2] else no[2],
  if (test[3]) yes[3] else no[3]
)
# c(1, 12, 3)

To capture the results of the zipped-together yeses and nos c(1, 12, 3), one must capture the return value from ifelse itself, not inside of the call to ifelse.

Another point that may be relevant: ifelse(cond, yes, now) is not at all a shortcut for if (cond) { yes } else { no }. Some key differences:

  • in if, the cond must always be exactly length 1, no more, no less.

    In R < 4.2, length 0 returns an error argument is of length zero (see ref), while length 2 or more produces a warning the condition has length > 1 and only the first element will be used (see ref1, ref2).

    In R >= 4.2, both conditions (should) produce an error (no warnings).

  • ifelse is intended to be vectorized, so the cond can be any length. yes= and no= should either be the same length or length 1 (recycling is in effect here); cond= should really be the same length as the longer of yes= and no=.

  • if does short-circuiting, meaning that if (TRUE || stop("quux")) 1 will never attempt to evaluate stop. This can be very useful when one condition will fail (logically or with a literal error) if attempted on a NULL object, such as if (!is.null(quux) && quux > 5) ....

    Conversely, ifelse always evaluates all three of cond=, yes=, and no=, and all values in each, there is no short-circuiting.

Upvotes: 3

Related Questions