Reputation: 33
I want to remove the last letter "O", except where is is part of the word "HELLO".
I've tried doing this:
Example:
a <- c("HELLO XO","DO HELLO","TWO XO","HO")
gsub("[^HELLO]O\\>","",a)
[1] "HELLO " " HELLO" "T " "HO"
but I want
"HELLO X" "D HELLO" "TW X" "H"
Upvotes: 3
Views: 67
Reputation: 1623
Your regular expression is nit correct.[^HELLO]
means any character except H
, E
, L
and O
. But you need except only exactly HELL
before O
. So, you should use following expression:
a <- c("HELLO XO","DO HELLO","TWO XO","HO")
gsub("(?<!\\bHELL)O\\b", "", a, perl=TRUE)
Upvotes: 1
Reputation: 4790
A little lengthy, but you can try like this
a <- c("HELLO XO","DO HELLO","TWO XO","HO")
b <- lapply(a, function(x) unlist(strsplit(x, " ")))
b
> b
[[1]]
[1] "HELLO" "XO"
[[2]]
[1] "DO" "HELLO"
[[3]]
[1] "TWO" "XO"
[[4]]
[1] "HO"
c <- unlist(lapply(b, function(y) paste(ifelse( y == "HELLO", "HELLO", gsub("O", "", y)), collapse = " " )))
c
[1] "HELLO X" "D HELLO" "TW X" "H"
Upvotes: 0
Reputation: 521269
Try replacing using the following pattern:
\b(?!HELLO\b)(\w+)O\b
This says to assert that the word HELLO
does not appear as the word, and then captures everything up until the final O
, should it occur. Then, it replaces with that optional final O
removed.
\b - from the start of the word
(?!HELLO\b) - assert that the word is not HELLO
(\w+)O - match a word ending in O, but don't capture final O
\b - end of word
The capture group, if a match happens, will contain the entire word minus the final O.
Code:
a <- c("HELLO XO","DO HELLO","TWO XO","HO")
gsub("\\b(?!HELLO\\b)(\\w+)O\\b", "\\1", a, perl=TRUE)
[1] "HELLO X" "D HELLO" "TW X" "H"
Note that we must Perl mode enabled (perl=TRUE
) with gsub
in order to use the negative lookahead.
Upvotes: 3
Reputation: 2998
a <- c("HELLO XO","DO HELLO","TWO XO","HO")
aa <- gsub("O","",a)
gsub("HELL", "HELLO",aa)
Upvotes: 0
Reputation: 174706
Use regex alternation operator |
a <- c("HELLO XO","DO HELLO","TWO XO","HO")
gsub("(HELLO)|O(?!\\S)", "\\1", a, perl=T)
# [1] "HELLO X" "D HELLO" "TW X" "H"
(HELLO)|O
this regex does two things,
First it captures all the HELLO
string.
Matches all the remaining 0
's which are not followed by a non-space character.
Upvotes: 1