Reputation: 726
I am using R to extract words from short text pieces. Specifically, I want to extract any word that appear in quotes (") from a string, but not when it appears inside brackets ().
For instance, I would like the "hello" first of the 3 strings, but not the other two:
c('"hello" world', 'hello world', '("hello") world')
Original code attempt
str_extract(x, '(?<=")[^$]+(?<=")')
Upvotes: 1
Views: 248
Reputation: 887891
We can use a regex lookaround
library(stringr)
ifelse(grepl('\\("', str1), NA, str_extract(str1, '(?<=")\\w+'))
#[1] "hello" NA NA
str1 <- c("\"hello\" world", "hello world", "(\"hello\") world")
Upvotes: 1
Reputation: 786091
You may use this regex with nested look arounds in str_extract
:
(?<=(?<!\()")[^"]+(?=(?!\))")
RegEx Details:
(?<=(?<!\()")
: Assert that we have a "
before but don't have a (
before "
[^"]+
: Match 1+ of any characters that are not "
(?=(?!\))")
: Assert that we have a "
after but don't have a )
after "
Code:
str_extract(x, '(?<=(?<!\\()")[^"]+(?=(?!\\))")')
or avoid double escaping by using a character class:
str_extract(x, '(?<=(?<![(])")[^"]+(?=(?![)])")')
Upvotes: 2