dogged
dogged

Reputation: 33

Selectively removing trailing string

I want to remove the last letter "O", except where is is part of the word "HELLO".

I've tried doing this:

Example:

a <- c("HELLO XO","DO HELLO","TWO XO","HO")
gsub("[^HELLO]O\\>","",a)

[1] "HELLO " " HELLO" "T " "HO"

but I want

"HELLO X" "D HELLO" "TW X" "H"

Upvotes: 3

Views: 67

Answers (5)

Ivan Burlutskiy
Ivan Burlutskiy

Reputation: 1623

Your regular expression is nit correct.[^HELLO] means any character except H, E, L and O. But you need except only exactly HELL before O. So, you should use following expression:

a <- c("HELLO XO","DO HELLO","TWO XO","HO")
gsub("(?<!\\bHELL)O\\b", "", a, perl=TRUE)

Upvotes: 1

Hardik Gupta
Hardik Gupta

Reputation: 4790

A little lengthy, but you can try like this

a <- c("HELLO XO","DO HELLO","TWO XO","HO")
b <- lapply(a, function(x) unlist(strsplit(x, " ")))
b
> b
[[1]]
[1] "HELLO" "XO"   

[[2]]
[1] "DO"    "HELLO"

[[3]]
[1] "TWO" "XO" 

[[4]]
[1] "HO"


c <- unlist(lapply(b, function(y) paste(ifelse( y == "HELLO", "HELLO", gsub("O", "", y)), collapse = " " )))
c

[1] "HELLO X" "D HELLO" "TW X"    "H"  

Upvotes: 0

Tim Biegeleisen
Tim Biegeleisen

Reputation: 521269

Try replacing using the following pattern:

\b(?!HELLO\b)(\w+)O\b

This says to assert that the word HELLO does not appear as the word, and then captures everything up until the final O, should it occur. Then, it replaces with that optional final O removed.

\b          - from the start of the word
(?!HELLO\b) - assert that the word is not HELLO
(\w+)O      - match a word ending in O, but don't capture final O
\b          - end of word

The capture group, if a match happens, will contain the entire word minus the final O.

Code:

a <- c("HELLO XO","DO HELLO","TWO XO","HO")
gsub("\\b(?!HELLO\\b)(\\w+)O\\b", "\\1", a, perl=TRUE)
[1] "HELLO X" "D HELLO" "TW X"    "H"

Note that we must Perl mode enabled (perl=TRUE) with gsub in order to use the negative lookahead.

Demo

Upvotes: 3

Shalini Baranwal
Shalini Baranwal

Reputation: 2998

a <- c("HELLO XO","DO HELLO","TWO XO","HO")

aa <- gsub("O","",a)
gsub("HELL", "HELLO",aa)

Upvotes: 0

Avinash Raj
Avinash Raj

Reputation: 174706

Use regex alternation operator |

a <- c("HELLO XO","DO HELLO","TWO XO","HO")
gsub("(HELLO)|O(?!\\S)", "\\1", a, perl=T)
# [1] "HELLO X" "D HELLO" "TW X"    "H"      

(HELLO)|O this regex does two things,

  1. First it captures all the HELLO string.

  2. Matches all the remaining 0's which are not followed by a non-space character.

Upvotes: 1

Related Questions