Ahdee
Ahdee

Reputation: 4949

R: Strange regex matching using quantifier "+"?

Hi suppose I have a term like this.

temp = "Big Satchel - Bird - turquoise"

I want to remove everything after the last "-"

so I first test this with this command it works as expected.

stringi::stri_replace_last(temp, regex = '-...', '')
[1] "Big Satchel - Bird rquoise"

however this does not,

> stringi::stri_replace_last(temp, regex = '-.+$', '')
[1] "Big Satchel "
> stringi::stri_replace_last(temp, regex = '-.+?$', '')
[1] "Big Satchel "

So why is it that when I don't have the quantifier it found and removed the last matched but fails otherwise? What I ultimately want to do is have it print.

Charming Satchel - Bird

Upvotes: 0

Views: 30

Answers (1)

akrun
akrun

Reputation: 887058

We can use [^-]+ to match one or more characters that are not a -. The . is a metacharacter that can match any character. So, in the OP's post, it matched the first -, followed by one or more all other characters

stringi::stri_replace_last(temp, regex = '\\s*-[^-]+$', '')
#[1] "Big Satchel - Bird"

With the current syntax, we can wrap with another stri_replace to get the expected

stringi::stri_replace(stringi::stri_replace_last(temp, 
     regex = '\\s*-[^-]+$', ''), regex = '\\w+', 'Charming')
#[1] "Charming Satchel - Bird"

Or use a single stri_replace

stringi::stri_replace(temp, regex = "^\\w+(\\s+.*)\\s+-[^-]+$",  "Charming$1")
#[1] "Charming Satchel - Bird"

Upvotes: 1

Related Questions