elcortegano
elcortegano

Reputation: 2684

Passing a custom function with conditionals to dplyr::mutate

I have a dataset like the following:

 seq <- tibble(REF = c("A","C","G","T","C","G"),
        REF2 = c("A","G","G","A","C","G")) %>%
   dplyr::mutate(UP = dplyr::lag(REF, n=1),
                 DOWN = dplyr::lead(REF, n=1))

# A tibble: 6 x 4
#  REF   REF2  UP    DOWN 
#  <chr> <chr> <chr> <chr>
#1 A     A     NA    C    
#2 C     G     A     G    
#3 G     G     C     T    
#4 T     A     G     C    
#5 C     C     T     G    
#6 G     G     C     NA 

And would like to change some of these letters (between A-T and G-C) above when the content of REF and REF2 columns are different. To do so, I have written a small functions, and run it with dplyr::mutate as follows:

switch_strand <- function(base) {
  if (base=="A") return ("T")
  else if (base=="T") return ("A")
  else if (base=="G") return ("C")
  else if (base=="C") return ("G")
  else if (is.na(base)) return (NA) 
  else stop("Error, base does not exist")
}

seq %>% dplyr::mutate(UP = ifelse(REF!=REF2,switch_strand(UP),UP),
                      DOWN = ifelse(REF!=REF2,switch_strand(DOWN),DOWN))

But the following error is obtained:

Error in if (base == "A") return("T") else if (base == "T") return("A") else if (base == : missing value where TRUE/FALSE needed In addition: Warning message: In if (base == "A") return("T") else if (base == "T") return("A") else if (base == : the condition has length > 1 and only the first element will be used

Which I dont understand, aren't values called in dplyr::mutate used in a row-wise manner? The above function works as expected if single letters are entered, but I do not understand why the full columns is being entered as argument there. How can be this fixed?

The expected output is:

# A tibble: 6 x 4
#  REF   REF2  UP    DOWN 
#  <chr> <chr> <chr> <chr>
#1 A     A     NA    C    
#2 C     G     T     C    
#3 G     G     C     T    
#4 T     A     C     G    
#5 C     C     T     G    
#6 G     G     C     NA

EDIT: I have fixed the switch_base function so it should return NA if base is NA, but seems to fail in this case.. it might be related to this.

Upvotes: 0

Views: 315

Answers (2)

Ronak Shah
Ronak Shah

Reputation: 388907

As already mentioned in the comments if/else not vectorized and currently the function only works for scalar input and not vectors. Since you are using dplyr, we can use case_when to make it vectorized.

library(dplyr)

switch_strand <- function(base) {
   case_when(base == "A" ~ "T", 
             base=="T" ~ "A", 
             base=="G" ~ "C", 
             base=="C" ~ "G")

}

and then the attempted code would work fine

seq %>% 
  mutate(UP = ifelse(REF!=REF2,switch_strand(UP),UP),
         DOWN = ifelse(REF!=REF2,switch_strand(DOWN),DOWN))

#  REF   REF2  UP    DOWN 
#  <chr> <chr> <chr> <chr>
#1 A     A     NA    C    
#2 C     G     T     C    
#3 G     G     C     T    
#4 T     A     C     G    
#5 C     C     T     G    
#6 G     G     C     NA   

Upvotes: 1

norimg
norimg

Reputation: 96

pass additional function "dplyr::rowwise()" before mutate:

seq %>% dplyr::rowwise() %>% dplyr::mutate(UP = ifelse(REF!=REF2,switch_strand(UP),UP),
                      DOWN = ifelse(REF!=REF2,switch_strand(DOWN),DOWN))

Upvotes: 4

Related Questions