Reputation: 8628
I need a regex for javascript for matching
"{any group of chars}" <-- where that last " is not preceeded by a \
examples:
... foo "bar" ... => "bar"
... foo"bar\"" ... => "bar\""
... foo "bar" ... goo"o"ooogle "t\"e\"st"[] => ["bar", "o", "t\"e\"st"]
The actual strings will be longer and may contain multiple matches that could also include white space or regex special chars.
I have started by trying to break down the syntax but not being strong with regex myself I got stuck pretty fast but i did get as far as matching everything except for the case where the match contains \" (i think) ...
https://regex101.com/r/sj4HXw/1
UPDATE:
More about my situation ...
This regex is to be used to "syntax highlight" strings in code blocks embedded in my blog posts so a real world example might look something like this ...
<pre id="test" class="code" data-code="csharp">
if (ConfigurationManager.AppSettings["LogSql"] == "true")
</pre>
And I am using the following javascript to achieve the highlight ..
var result = $("#test").text().replace(/"[^"\\]*(?:\\[\s\S][^"\\]*)*"/g, "<span class=\"string\">$1</span>");
$("#test").html(result);
For some reason even when the suggested answers (so far at least) are used in this context i'm getting odd results.
This works but puts the value $1 instead of the actual match for some reason.
Upvotes: 2
Views: 1857
Reputation: 627536
The most efficient regex (that is written in accordance with the unroll-the-loop principle) you may use here is
"[^"\\]*(?:\\[\s\S][^"\\]*)*"
See the regex demo
Details:
"
- match the first "
[^"\\]*
- 0+ chars other than "
and \
(?:\\[\s\S][^"\\]*)*
- zer or more occurrences of:
\\[\s\S]
- any char ([\s\S]
) with a \
in front[^"\\]*
- 0+ chars other than "
and \
"
- a closing "
.Usage:
// MATCHING
var rx = /"[^"\\]*(?:\\[\s\S][^"\\]*)*"/g;
var s = ' ... foo "bar" ... goo"o"ooogle "t\\"e\\"st"[]';
var res = s.match(rx);
console.log(res);
// REPLACING
console.log(s.replace(rx, '<span>$&</span>'));
If there is an escaped "
before a valid match or there are \
s before a "
, the approach above won't work. You will need to match those \
s and capture the substring you need.
/(?:^|[^\\])(?:\\{2})*("[^"\\]*(?:\\[\s\S][^"\\]*)*")/g
^^^^^^^^^^^^^^^^^^^^^^ ^
See another regex demo.
Usage:
// MATCHING
var rx = /(?:^|[^\\])(?:\\{2})*("[^"\\]*(?:\\[\s\S][^"\\]*)*")/g;
var s = ' ... \\"foo "bar" ... goo"o"ooogle "t\\"e\\"st"[]';
var m, res=[];
while (m = rx.exec(s)) {
res.push(m[1]);
}
console.log(res);
// REPLACING
console.log(s.replace(/((?:^|[^\\])(?:\\{2})*)("[^"\\]*(?:\\[\s\S][^"\\]*)*")/g, '$1<span>$2</span>'));
The main pattern is wrapped with capturing parentheses, and this is added at the start:
(?:^|[^\\])
- either start of string or any char but \
(?:\\{2})*
- 0+ occurrences of a double backslash.Upvotes: 2
Reputation: 3444
This should do it:
"(\\[\s\S]|[^"\\])*"
It's a mixture of the other answers from Wiktor and Taufik.
Upvotes: 0
Reputation: 3409
Prioritize the escaped characters first:
"(\\.|[^"])*"
https://regex101.com/r/sj4HXw/2
Upvotes: 4