Reputation: 2684
I have a dataset like the following:
seq <- tibble(REF = c("A","C","G","T","C","G"),
REF2 = c("A","G","G","A","C","G")) %>%
dplyr::mutate(UP = dplyr::lag(REF, n=1),
DOWN = dplyr::lead(REF, n=1))
# A tibble: 6 x 4
# REF REF2 UP DOWN
# <chr> <chr> <chr> <chr>
#1 A A NA C
#2 C G A G
#3 G G C T
#4 T A G C
#5 C C T G
#6 G G C NA
And would like to change some of these letters (between A-T and G-C) above when the content of REF and REF2 columns are different. To do so, I have written a small functions, and run it with dplyr::mutate
as follows:
switch_strand <- function(base) {
if (base=="A") return ("T")
else if (base=="T") return ("A")
else if (base=="G") return ("C")
else if (base=="C") return ("G")
else if (is.na(base)) return (NA)
else stop("Error, base does not exist")
}
seq %>% dplyr::mutate(UP = ifelse(REF!=REF2,switch_strand(UP),UP),
DOWN = ifelse(REF!=REF2,switch_strand(DOWN),DOWN))
But the following error is obtained:
Error in if (base == "A") return("T") else if (base == "T") return("A") else if (base == : missing value where TRUE/FALSE needed In addition: Warning message: In if (base == "A") return("T") else if (base == "T") return("A") else if (base == : the condition has length > 1 and only the first element will be used
Which I dont understand, aren't values called in dplyr::mutate
used in a row-wise manner? The above function works as expected if single letters are entered, but I do not understand why the full columns is being entered as argument there. How can be this fixed?
The expected output is:
# A tibble: 6 x 4
# REF REF2 UP DOWN
# <chr> <chr> <chr> <chr>
#1 A A NA C
#2 C G T C
#3 G G C T
#4 T A C G
#5 C C T G
#6 G G C NA
EDIT: I have fixed the switch_base
function so it should return NA if base is NA, but seems to fail in this case.. it might be related to this.
Upvotes: 0
Views: 315
Reputation: 388907
As already mentioned in the comments if
/else
not vectorized and currently the function only works for scalar input and not vectors. Since you are using dplyr
, we can use case_when
to make it vectorized.
library(dplyr)
switch_strand <- function(base) {
case_when(base == "A" ~ "T",
base=="T" ~ "A",
base=="G" ~ "C",
base=="C" ~ "G")
}
and then the attempted code would work fine
seq %>%
mutate(UP = ifelse(REF!=REF2,switch_strand(UP),UP),
DOWN = ifelse(REF!=REF2,switch_strand(DOWN),DOWN))
# REF REF2 UP DOWN
# <chr> <chr> <chr> <chr>
#1 A A NA C
#2 C G T C
#3 G G C T
#4 T A C G
#5 C C T G
#6 G G C NA
Upvotes: 1
Reputation: 96
pass additional function "dplyr::rowwise()" before mutate:
seq %>% dplyr::rowwise() %>% dplyr::mutate(UP = ifelse(REF!=REF2,switch_strand(UP),UP),
DOWN = ifelse(REF!=REF2,switch_strand(DOWN),DOWN))
Upvotes: 4