Reputation: 1000
I'm getting stuck with trying to normalize a bunch of addresses. Is there a different regex that behaves similar to \\b\\b
when using gsub()
but can replace multiple words?
address <- c("SE Kellogg", "SE Kellogg Court")
gsub("\\bSE Kellogg\\b", "SE Kellogg Court", address)
#desired output:
"SE Kellogg Court" "SE Kellogg Court"
# actual output
"SE Kellogg Court" "SE Kellogg Court Court"
Upvotes: 1
Views: 964
Reputation: 626689
You may use a PCRE regex with a negative lookahead:
\bSE Kellogg\b(?!\s+Court\b)
See the regex demo.
Details
\\b
- a word boundarySE Kellogg
- a literal substring\\b
- a word boundary(?!\\s+Court\\b)
- a negaive lookahead that fails the match if, immediately to the right of the current location, there are
\\s+
- one or more whitespace charsCourt\\b
- a whole word Court
.> gsub("\\bSE Kellogg\\b(?!\\s+Court\\b)", "SE Kellogg Court", address, perl=TRUE)
[1] "SE Kellogg Court" "SE Kellogg Court"
Note you may shorten the replacement if you use a capturing group ((...)
) around the searchphrase and a \1
backreference in the replacement pattern:
gsub("\\b(SE Kellogg)\\b(?!\\s+Court\\b)", "\\1 Court", address, perl=TRUE)
^ ^ ^^^
Upvotes: 6