tblznbits
tblznbits

Reputation: 6778

gsub not replacing all expected matches in R

Let's say I have the string x <- "AbC" and I want to put an ampersand in between each letter. I would have assumed I could just do gsub("([a-zA-Z])([a-zA-Z])", "\\1 & \\2", x), but that produces "A & bC". Why doesn't gsub recognize the second set of letters that match the regex? It's not like gsub only replaces the first match found. If I have x <- "AbC DE" and run the same command, I get "A & bC D & E".

What am I missing in terms of how gsub is doing it's replacement? I would have expected outputs of "A & b & C" and "A & b & C D & E" from the two inputs above.

Upvotes: 2

Views: 805

Answers (1)

Avinash Raj
Avinash Raj

Reputation: 174706

Because if a character present in one match, regex engine won't match the same character again. That is, it won't do overlapping matches.. Use lookaround to overcome this..

gsub("([a-zA-Z])(?=[a-zA-Z])", "\\1 & ", x, perl=T)

DEMO

Upvotes: 10

Related Questions