alki
alki

Reputation: 3574

Regex R substituting in a vector of replacements with parentheses

Suppose I have a string x like so.

x <- "CTTTANNNNNNNYG"

I would like to replace each letter in x with a different string that may not be f the same length.

a <- c("A","C","G","T","W","S","M","K","R","Y","B","D","H","V","N")
b <- c("A","C","G","T","(A|T)","(C|G)","(A|C)","(G|T)","(A|G)","(C|T)","(C|G|T)","(A|G|T)","(A|C|T)","(A|C|G)","(A|C|G|T)")

If I wanted to replace the letters in vector a with the corresponding ones in vector b, I would want to manipulate string x into:

"CTTTA(A|C|G|T)(A|C|G|T)(A|C|G|T)(A|C|G|T)(A|C|G|T)(A|C|G|T)(A|C|G|T)(C|T)G"

I've tried using mapply(gsub, a,b,x) and str_replace() to no avail. Any help would be appreciated.

Upvotes: 1

Views: 52

Answers (3)

nicola
nicola

Reputation: 24480

Since replacements are "fixed" and involve each just one letter, you can achieve the same result without using neither regex nor any additional packages. For instance:

vapply(strsplit(x,"",fixed=TRUE),function(z) paste(setNames(b,a)[z],collapse=""),"")
#[1] "CTTTA(A|C|G|T)(A|C|G|T)(A|C|G|T)(A|C|G|T)(A|C|G|T)(A|C|G|T)(A|C|G|T)(C|T)G"

Upvotes: 4

MrFlick
MrFlick

Reputation: 206187

If you wanted to do this with base functions, you need to basically do each of the replacements sequentially (gsub isn't vectorized in this way). Here's one way to do that

Reduce(
    function(x, replace) {
        gsub(replace$pattern, replace$value, x)
    }, 
    Map(function(a,b) list(pattern=a, value=b), a, b), 
    init=x
)
# [1] "CTTTA(A|C|G|T)(A|C|G|T)(A|C|G|T)(A|C|G|T)(A|C|G|T)(A|C|G|T)(A|C|G|T)(C|T)G"

We use Map to make pairs of match/replace values and then sequentially apply them to the string with Reduce

Upvotes: 2

akrun
akrun

Reputation: 886948

We can use mgsub from library(qdap)

library(qdap)
mgsub(a, b, x)
#[1] "CTTTA(A|C|G|T)(A|C|G|T)(A|C|G|T)(A|C|G|T)(A|C|G|T)(A|C|G|T)(A|C|G|T)(C|T)G"

Upvotes: 4

Related Questions