Moohan
Moohan

Reputation: 1011

Making a custom function apply rowise in dplyr mutate

I have a custom boolean function which checks a string (my actual function does more than that provided below, which is just provided as an illustrative example).

If I use the first version with dplyr::mutate() it just applies to the first value and then sets all rows to be that answer.

I can wrap the function in a purr::map() however this seems very slow on larger datasets. It also doesn't seem to be the way that mutate normally works.

library(tidyverse)

valid_string <- function(string) {
  # Check the length
  if (stringr::str_length(string) != 10) {
    return(FALSE)
  }
  return(TRUE)
}

# Create a tibble to test on
test_tib <- tibble::tibble(string = c("1504915593", "1504915594", "9999999999", "123"),
                           known_valid = c(TRUE, TRUE, TRUE, FALSE))

# Apply the function
test_tib <- dplyr::mutate(test_tib, check_valid = valid_string(string))
test_tib

valid_string2 <- function(string) {
  purrr::map_lgl(string, function(string) {
    # Check the length
    if (stringr::str_length(string) != 10) {
      return(FALSE)
    }
    return(TRUE)
  })
}

# Apply the function
test_tib <- dplyr::mutate(test_tib, check_valid2 = valid_string2(string))
test_tib

Upvotes: 1

Views: 286

Answers (2)

DSGym
DSGym

Reputation: 2867

I would suggest you rewrite your function as vectorized function like this:

valid_string <- function(string) {
  # Check the length
  ifelse(stringr::str_length(string) != 10, FALSE, TRUE)
}

Another option would be the Vectorize function from base which would work like this:

valid_string2 <- function(string) {
  # Check the length
  if(stringr::str_length(string) != 10) {
    return(FALSE)
  }
  return(TRUE)
}
valid_string2 <- Vectorize(valid_string2)    

Both work pretty good, however I would suggest the solution with ifelse.

# Create a tibble to test on
test_tib <- tibble::tibble(string = c("1504915593", "1504915594", "9999999999", "123"),
                           known_valid = c(TRUE, TRUE, TRUE, FALSE))

# Apply the function
test_tib <- dplyr::mutate(test_tib, check_valid = valid_string(string))
test_tib <- dplyr::mutate(test_tib, check_valid2 = valid_string2(string))
test_tib


  string     known_valid check_valid check_valid2
  <chr>      <lgl>       <lgl>       <lgl>       
1 1504915593 TRUE        TRUE        TRUE        
2 1504915594 TRUE        TRUE        TRUE        
3 9999999999 TRUE        TRUE        TRUE        
4 123        FALSE       FALSE       FALSE

Upvotes: 1

Mike
Mike

Reputation: 4400

Is this what you are looking for?

test_tib <- dplyr::mutate(test_tib, checkval = ifelse(nchar(string)!=10,FALSE,TRUE))

Upvotes: 0

Related Questions