remi
remi

Reputation: 779

Pattern detection and splitting by space with exception

Objective: detect a pattern and split it by a space, with an exception.

The illustrative example:

data <- c("redcat","big bobcat","bobcat","greencat","pinkcat","north cat")

I would like to retrieve all instances where cat is assembled with another word:

> data[grepl(".[a-z]cat$",data)]
[1] "redcat"     "big bobcat" "bobcat"     "greencat"   "pinkcat"   

When found, each of the matches needs to be split by a space. The exception is bobcat which is another level and needs to remain unaltered.

The ideal result should be then:

[#] "red cat"     "big bobcat" "bobcat"     "green cat"   "pink cat" 

Any ideas how to achieve this? Thank you.

Upvotes: 0

Views: 51

Answers (1)

Avinash Raj
Avinash Raj

Reputation: 174776

Use a negative lookbehind assertion based regex in sub or gsub function.

> data <- c("redcat","big bobcat","bobcat","greencat","pinkcat","north cat")
> gsub("(?<!bob)cat", " cat", data[grepl(".[a-z]cat$",data)], perl=T)
[1] "red cat"    "big bobcat" "bobcat"     "green cat"  "pink cat" 
> gsub("(?<!\\bbob)cat", " cat", data[grepl(".[a-z]cat$",data)], perl=T)
[1] "red cat"    "big bobcat" "bobcat"     "green cat"  "pink cat"

(?<!\\bbob)cat makes the regex engine to match all the cat except the one which was preceded by bob.

Apply the regex directly on data.

gsub("(?<!\\bbob)\\Bcat$", " cat", data)

Upvotes: 3

Related Questions