Allan A
Allan A

Reputation: 437

How to extract part of string between a specific set of qoutes with stringr?

I need to extract a specific part of a string that is in between a set of qoutes, and I need to do it without getting subsequent parts of the string that is also inbetween qoutes.

For example if I want to extract only the values between the qoutes after the viewBox attribute in this markup tag;

"<svg height=\"512pt\" viewBox=\"-9 0 512 512\" width=\"512pt\" xmlns=\"http://www.w3.org/2000/svg\">"

I would like to be able to extract a wide variety of different signs, and because of the I have tryed the [:print:] commant in stringr. But I have not been able to limit the extraction to the desired set of qoutes. Moreover, the viewBox is only an example so no specific solutions in regards to that.

string <- "<svg height=\"512pt\" viewBox=\"-9 0 512 512\" width=\"512pt\" xmlns=\"http://www.w3.org/2000/svg\">"

string %>% 
  str_extract("(?<= viewBox=\")[:print:]+(?<!\" )")

The current result is;

"-9 0 512 512\" width=\"512pt\" xmlns=\"http://www.w3.org/2000/svg\">"

Whereas the desired result is:

"-9 0 512 512"

Upvotes: 1

Views: 50

Answers (1)

akrun
akrun

Reputation: 887148

We can match for characters that are not a double quote (") after the regex lookaround

library(stringr)
str_extract(string, '(?<=viewBox=")[^"]+')
#[1] "-9 0 512 512"

Upvotes: 1

Related Questions