Reputation: 437
I need to extract a specific part of a string that is in between a set of qoutes, and I need to do it without getting subsequent parts of the string that is also inbetween qoutes.
For example if I want to extract only the values between the qoutes after the viewBox attribute in this markup tag;
"<svg height=\"512pt\" viewBox=\"-9 0 512 512\" width=\"512pt\" xmlns=\"http://www.w3.org/2000/svg\">"
I would like to be able to extract a wide variety of different signs, and because of the I have tryed the [:print:] commant in stringr. But I have not been able to limit the extraction to the desired set of qoutes. Moreover, the viewBox is only an example so no specific solutions in regards to that.
string <- "<svg height=\"512pt\" viewBox=\"-9 0 512 512\" width=\"512pt\" xmlns=\"http://www.w3.org/2000/svg\">"
string %>%
str_extract("(?<= viewBox=\")[:print:]+(?<!\" )")
The current result is;
"-9 0 512 512\" width=\"512pt\" xmlns=\"http://www.w3.org/2000/svg\">"
Whereas the desired result is:
"-9 0 512 512"
Upvotes: 1
Views: 50
Reputation: 887148
We can match for characters that are not a double quote ("
) after the regex lookaround
library(stringr)
str_extract(string, '(?<=viewBox=")[^"]+')
#[1] "-9 0 512 512"
Upvotes: 1