Village.Idyot
Village.Idyot

Reputation: 2121

How to remove all characters before last whitespace in R string but with exceptions for certain character sequences?

I've been using the following to remove all characters before the last whitespace in R character strings: gsub(".*\\s", "", "Big Dog") returns "Dog" which is perfect.

How could I exclude certain patterns from being removed? For example, let's say I always want to preserve "Big Dog", so if I have the string "Look at that crazy Big Dog", running the gsub() (or other code) returns "Big Dog" with that whitespace between Big and Dog retained. In the complete code this is intended for, the equivalent of "Big Dog" isn't dynamic so hard-coding in of "Big Dog" is fine. "Big Dog" will always occupy the last position in a character string too.

Upvotes: 1

Views: 81

Answers (2)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627600

As you do not know the Dog beforehand, you can use

sub("^.*?((?:\\bBig\\s+)?\\S+)$", "\\1", text)

See the regex demo. Note the use of the sub function, you only need to search and replace once in a string.

Details:

  • ^ - start of string
  • .*? - any zero or more chars as few as possible
  • ((?:\bBig\s+)?\S+) - Group 1:
    • (?:\bBig\s+)? - an optional sequence of a whole word Big (\b is a word boundary) and then one or more whitespace chars (\s+)
    • \S+ - one or more non-whitespace chars
  • $ - end of string.

The \1 replacement puts back the value from Group 1 into the result.

See the R demo:

x <- c("Look at that crazy Dog", "Look at that crazy Big Dog")
sub("^.*?((?:\\bBig\\s+)?\\S+)$", "\\1", x)
# => [1] "Dog"     "Big Dog"

Upvotes: 1

Tim Biegeleisen
Tim Biegeleisen

Reputation: 522797

Assuming you know all the words and phrases which you don't want to replace at the end of a string, you could use the following whitelist approach:

input <- c("Look at that crazy Dog", "Look at that crazy Big Dog")
keep <- c("Big Dog", "Dog")
regex <- paste0(".*?\\b(", paste(keep, collapse="|"), ")$")
output <- sub(regex, "\\1", input)
output  # [1] "Dog"     "Big Dog"

Upvotes: 1

Related Questions