Reputation: 23161
There are quite a few similar questions already but none of them works in my case. I have a string that contains multiple substrings inside double quotes and these substrings can contain escaped double quotes.
For example for the string 'And then, "this is some sample text with quotes and \"escaped quotes\" inside". Not that we need more, but... "here is \"another\" one". Just in case.', the expected result is an array with two elements;
"this is some sample text with quotes and \"escaped quotes\" inside"
"here is \"another\" one"
The /"(?:\\"|[^"])*"/g
regex works as expected on regex101; however, when I use String#match()
the result is different. Check out the snippet below:
let str = 'And then, "this is some sample text with quotes and \"escaped quotes\" inside". Not that we need more, but... "here is \"another\" one". Just in case.'
let regex = /"(?:\\"|[^"])*"/g
console.log(str.match(regex))
Instead of two matches, I got four, and the text inside the escaped quotes is not even included.
MDN mentions that if the g
flag is used, all results matching the complete regular expression will be returned, but capturing groups will not. If I want to obtain capture groups and the global flag is set, I need to use RegExp.exec()
. I've tried it, the result is the same:
let str = 'And then, "this is some sample text with quotes and \"escaped quotes\" inside". Not that we need more, but... "here is \"another\" one". Just in case.'
let regex = /"(?:\\"|[^"])*"/g
let temp
let matches = []
while (temp = regex.exec(str))
matches.push(temp[0])
console.log(matches)
How could I get an array with those two matched elements?
Upvotes: 4
Views: 4547
Reputation: 18611
Another option is a more optimal regex without |
operator:
const str = String.raw`And then, "this is some sample text with quotes and \"escaped quotes\" inside". Not that we need more, but... "here is \"another\" one". Just in case.`
const regex = /"[^"\\]*(?:\\[\s\S][^"\\]*)*"/g
console.log(str.match(regex))
Using String.raw
, there is no need escaping quotes twice.
See regex proof. Btw, 28 steps vs. 267 steps.
EXPLANATION
--------------------------------------------------------------------------------
" '"'
--------------------------------------------------------------------------------
[^"\\]* any character except: '"', '\\' (0 or more
times (matching the most amount possible))
--------------------------------------------------------------------------------
(?: group, but do not capture (0 or more times
(matching the most amount possible)):
--------------------------------------------------------------------------------
\\ '\'
--------------------------------------------------------------------------------
[\s\S] any character of: whitespace (\n, \r,
\t, \f, and " "), non-whitespace (all
but \n, \r, \t, \f, and " ")
--------------------------------------------------------------------------------
[^"\\]* any character except: '"', '\\' (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
)* end of grouping
--------------------------------------------------------------------------------
" '"'
Upvotes: 3
Reputation: 10201
The reason why regex doesn't work as expected is because a single backslash is an escape character. You'll need escape the backslashes in the text:
let str = 'And then, "this is some sample text with quotes and \"escaped quotes\" inside". Not that we need more, but... "here is \"another\" one". Just in case.';
let regex = /"(?:\\"|[^"])*"/g
console.log(str);
console.log(str.match(regex))
str = 'And then, "this is some sample text with quotes and \\"escaped quotes\\" inside". Not that we need more, but... "here is \\"another\\" one". Just in case.';
console.log(str);
console.log(str.match(regex))
Upvotes: 2