philiporlando
philiporlando

Reputation: 1000

Using gsub to replace multiple words in R

I'm getting stuck with trying to normalize a bunch of addresses. Is there a different regex that behaves similar to \\b\\b when using gsub() but can replace multiple words?

address <- c("SE Kellogg", "SE Kellogg Court")
gsub("\\bSE Kellogg\\b", "SE Kellogg Court", address)

#desired output:
"SE Kellogg Court" "SE Kellogg Court"

# actual output
"SE Kellogg Court" "SE Kellogg Court Court"

Upvotes: 1

Views: 964

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626689

You may use a PCRE regex with a negative lookahead:

\bSE Kellogg\b(?!\s+Court\b)

See the regex demo.

Details

  • \\b - a word boundary
  • SE Kellogg - a literal substring
  • \\b - a word boundary
  • (?!\\s+Court\\b) - a negaive lookahead that fails the match if, immediately to the right of the current location, there are
    • \\s+ - one or more whitespace chars
    • Court\\b - a whole word Court.

R demo:

> gsub("\\bSE Kellogg\\b(?!\\s+Court\\b)", "SE Kellogg Court", address, perl=TRUE)
[1] "SE Kellogg Court" "SE Kellogg Court"

Note you may shorten the replacement if you use a capturing group ((...)) around the searchphrase and a \1 backreference in the replacement pattern:

gsub("\\b(SE Kellogg)\\b(?!\\s+Court\\b)", "\\1 Court", address, perl=TRUE)
         ^          ^                       ^^^   

Upvotes: 6

Related Questions