Nivel
Nivel

Reputation: 679

String replace ignoring characters

I have the following string:

string <- c("ABDSFGHIJLKOP")

and list of substrings:

sub <- c("ABDSF", "SFGH", "GHIJLKOP")

I would like to include < and > after each sub match thus getting:

<ABD><SF><GH><GHIJKOP>

I have tried the following code by pattern matching over a list but as soon as ABDSF is matched SFGH is not recognised anymore because of the inclusion of the < > characters. Anybody have a better idea?

library(stringr)
library(dplyr)
library(magrittr)

string <- c("ABDSFGHIJLKOP")
sub <- c("ABDSF", "SFGH", "GHIJLKOP")

for (s in sub){

string %<>% str_replace_all(., s, paste0('<', s,'>'))
}

print(string)


Result: [1] "<ABDSF><GHIJLKOP>"

EDIT: The problem that I have with the above code is that as soon as the < > characters are inserted, after the first string match the second string SFGH is not recognised anymore because the string is now:

 <ABDSF>GHIJLKOP. 

So I am looking for a way to match the substrings ignoring the <> characters.

Upvotes: 0

Views: 211

Answers (2)

G. Grothendieck
G. Grothendieck

Reputation: 269644

Place [<>]* between successive characters in sub and then perform the substituations with those patterns. No packages are used.

# test input
string <- "ABDSFGHIJLKOP"
subs <- c("ABDSF", "SFGH", "GHIJLKOP")

pats <- paste0("(", gsub("(?<=[EF])(.)(?=.)", "\\1[<>]*", subs, perl = TRUE), ")")
s <- string
for(p in pats) s <- gsub(p, "<\\1>", s)
s
## [1] "<ABD<SF><GH>IJLKOP>"

Update

Regarding the comment below if I understand correctly we could add (?<=[EF]) giving:

pats <- paste0("(", gsub("(?<=[EF])(.)(?=.)", "\\1[<>]*", subs, perl = TRUE), ")")
s <- string
for(p in pats) s <- gsub(p, "<\\1>", s)
s
## [1] "<ABDSF><GHIJLKOP>"

Upvotes: 3

lagripe
lagripe

Reputation: 764

#R version 3.3.2 

library(stringr)
library(magrittr)

string <- c("ABDSFGHIJLKOP")
sub <- c("ABDSF", "SFGH", "GHIJLKOP")
result <- c("")
for (s in sub){
temp<- c(str_extract(string, s))
if (!is.null(temp)) {
        temp<- paste("<",temp,">",sep = "")
        result <- paste(result,temp,sep = "")

    }
}
print(result)

Result :

[1] "<ABDSF><SFGH><GHIJLKOP>"

Tested in Rextester

Upvotes: 0

Related Questions