Max TC
Max TC

Reputation: 79

removes part of string in r

I'm trying to extract ES at the end of a string

> data <- c("phrases", "phases", "princesses","class","pass")
> data1 <- gsub("(\\w+)(s)+?es\\b", "\\1\\2", data, perl=TRUE)
> gsub("(\\w+)s\\b", "\\1", data1, perl=TRUE)
[1] "phra"    "pha"     "princes" "clas"    "pas" 

I get this result

 [1] "phra"    "pha"     "princes" "clas"    "pas" 

but in reality what I need to obtain is:

[1] "phras"    "phas"     "princess" "clas"    "pas" 

Upvotes: 0

Views: 64

Answers (2)

acylam
acylam

Reputation: 18681

You can use a word boundary (\\b) if it is guaranteed that each word is followed by a punctuation or is at the end of the string:

data <- c("phrases, phases, princesses, bases")

gsub('es\\b', '', data)
# [1] "phras, phas, princess, bas"

With your method, just wrap everything till the second + with one set of parentheses:

gsub("(\\w+s+)es\\b", "\\1", data)
# [1] "phras, phas, princess, bas"

There is also no need to make + lazy with ?, since you are trying to match as many consecutive s's as possible.

Edit:

OP changed the data and the desired output. Below is a simple solution that removes either es or s at the end of each string:

data <- c("phrases", "phases", "princesses","class","pass")

gsub('(es|s)\\b', '', data)
# [1] "phras"    "phas"     "princess" "clas"     "pas" 

Upvotes: 2

Nahuel Fouilleul
Nahuel Fouilleul

Reputation: 19315

maybe you are looking for a lookbehind assertion (which is a 0 length match)

"(?<=s)es\\b"

or because lookbehind can't have a variable length perl \K construct to keep out of match left of \K

"\\ws\\Kes\\b"

Upvotes: 0

Related Questions