R retrieving strings with sub: Why this does not work?

Question

I would like to extract parts of strings. The string is:

> (x <- 'ab/cd efgh "xyz xyz"')
> [1] "ab/cd efgh "xyz xyz""

Now, I would like first to extract the first part:

> # get "ab/cd efgh"
> sub(" "[/A-Za-z ]+"","",x)
[1] "ab/cd efgh"

But I don't succeed in extracting the second part:

> # get "xyz xyz"
> sub("("[A-Za-z ]+")$","\1",x, perl=TRUE)
[1] "ab/cd efgh "xyz xyz""

What is wrong with this code?
Thanks for help.

Wiktor Stribiżew · Accepted Answer

Your last snippet does not work because you reinsert the whole match back into the result: (\"[A-Za-z ]+\")$ matches and captures ", 1+ letters and spaces, " into Group 1 and \1 in the replacement puts it back.

You may actually get the last part inside quotes by removing all chars other than " at the start of the string:

x <- 'ab/cd efgh "xyz xyz"'
sub('^[^"]+', "", x)

See the R demo

The sub here will find and replace just once, and it will match the string start (with ^) followed with 1+ chars other than " with [^"]+ negated character class.

R retrieving strings with sub: Why this does not work?

Answers (2)

Related Questions