Laurence_jj
Laurence_jj

Reputation: 726

Extract word in quotes from string

I am using R to extract words from short text pieces. Specifically, I want to extract any word that appear in quotes (") from a string, but not when it appears inside brackets ().

For instance, I would like the "hello" first of the 3 strings, but not the other two:

c('"hello" world', 'hello world', '("hello") world')

Original code attempt

str_extract(x, '(?<=")[^$]+(?<=")')

Upvotes: 1

Views: 248

Answers (2)

akrun
akrun

Reputation: 887891

We can use a regex lookaround

library(stringr)
ifelse(grepl('\\("', str1), NA,  str_extract(str1, '(?<=")\\w+'))
#[1] "hello" NA      NA    

data

str1 <- c("\"hello\" world", "hello world", "(\"hello\") world")

Upvotes: 1

anubhava
anubhava

Reputation: 786091

You may use this regex with nested look arounds in str_extract:

(?<=(?<!\()")[^"]+(?=(?!\))")

RegEx Demo

RegEx Details:

  • (?<=(?<!\()"): Assert that we have a " before but don't have a ( before "
  • [^"]+: Match 1+ of any characters that are not "
  • (?=(?!\))"): Assert that we have a " after but don't have a ) after "

Code:

str_extract(x, '(?<=(?<!\\()")[^"]+(?=(?!\\))")')

or avoid double escaping by using a character class:

str_extract(x, '(?<=(?<![(])")[^"]+(?=(?![)])")')

Upvotes: 2

Related Questions