Reputation: 779
Objective: detect a pattern and split it by a space, with an exception.
The illustrative example:
data <- c("redcat","big bobcat","bobcat","greencat","pinkcat","north cat")
I would like to retrieve all instances where cat
is assembled with another word:
> data[grepl(".[a-z]cat$",data)]
[1] "redcat" "big bobcat" "bobcat" "greencat" "pinkcat"
When found, each of the matches needs to be split by a space. The exception is bobcat
which is another level and needs to remain unaltered.
The ideal result should be then:
[#] "red cat" "big bobcat" "bobcat" "green cat" "pink cat"
Any ideas how to achieve this? Thank you.
Upvotes: 0
Views: 51
Reputation: 174776
Use a negative lookbehind assertion based regex in sub
or gsub
function.
> data <- c("redcat","big bobcat","bobcat","greencat","pinkcat","north cat")
> gsub("(?<!bob)cat", " cat", data[grepl(".[a-z]cat$",data)], perl=T)
[1] "red cat" "big bobcat" "bobcat" "green cat" "pink cat"
> gsub("(?<!\\bbob)cat", " cat", data[grepl(".[a-z]cat$",data)], perl=T)
[1] "red cat" "big bobcat" "bobcat" "green cat" "pink cat"
(?<!\\bbob)cat
makes the regex engine to match all the cat
except the one which was preceded by bob
.
Apply the regex directly on data
.
gsub("(?<!\\bbob)\\Bcat$", " cat", data)
Upvotes: 3