War
War

Reputation: 8628

Matching quote wrapped strings in javascript with regex

I need a regex for javascript for matching

"{any group of chars}" <-- where that last " is not preceeded by a \

examples:

... foo "bar" ...  => "bar"
... foo"bar\"" ... => "bar\""
... foo "bar" ...  goo"o"ooogle "t\"e\"st"[] => ["bar", "o", "t\"e\"st"]

The actual strings will be longer and may contain multiple matches that could also include white space or regex special chars.

I have started by trying to break down the syntax but not being strong with regex myself I got stuck pretty fast but i did get as far as matching everything except for the case where the match contains \" (i think) ...

https://regex101.com/r/sj4HXw/1

UPDATE:

More about my situation ...

This regex is to be used to "syntax highlight" strings in code blocks embedded in my blog posts so a real world example might look something like this ...

<pre id="test" class="code" data-code="csharp">
   if (ConfigurationManager.AppSettings["LogSql"] == "true")
</pre>

And I am using the following javascript to achieve the highlight ..

var result = $("#test").text().replace(/"[^"\\]*(?:\\[\s\S][^"\\]*)*"/g, "<span class=\"string\">$1</span>");
$("#test").html(result);

For some reason even when the suggested answers (so far at least) are used in this context i'm getting odd results.

This works but puts the value $1 instead of the actual match for some reason.

Upvotes: 2

Views: 1857

Answers (3)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627536

Simple scenario (as in OP)

The most efficient regex (that is written in accordance with the unroll-the-loop principle) you may use here is

"[^"\\]*(?:\\[\s\S][^"\\]*)*"

See the regex demo

Details:

  • " - match the first "
  • [^"\\]* - 0+ chars other than " and \
  • (?:\\[\s\S][^"\\]*)* - zer or more occurrences of:
    • \\[\s\S] - any char ([\s\S]) with a \ in front
    • [^"\\]* - 0+ chars other than " and \
  • " - a closing ".

Usage:

// MATCHING
var rx = /"[^"\\]*(?:\\[\s\S][^"\\]*)*"/g;
var s = '    ... foo "bar" ...  goo"o"ooogle "t\\"e\\"st"[]';
var res = s.match(rx);
console.log(res);

// REPLACING
console.log(s.replace(rx, '<span>$&</span>'));

More advanced scenario

If there is an escaped " before a valid match or there are \s before a ", the approach above won't work. You will need to match those \s and capture the substring you need.

/(?:^|[^\\])(?:\\{2})*("[^"\\]*(?:\\[\s\S][^"\\]*)*")/g
 ^^^^^^^^^^^^^^^^^^^^^^                             ^

See another regex demo.

Usage:

// MATCHING
var rx = /(?:^|[^\\])(?:\\{2})*("[^"\\]*(?:\\[\s\S][^"\\]*)*")/g;
var s = '    ... \\"foo "bar" ...  goo"o"ooogle "t\\"e\\"st"[]';
var m, res=[];
while (m = rx.exec(s)) {
  res.push(m[1]);
}
console.log(res);

// REPLACING
console.log(s.replace(/((?:^|[^\\])(?:\\{2})*)("[^"\\]*(?:\\[\s\S][^"\\]*)*")/g, '$1<span>$2</span>'));

The main pattern is wrapped with capturing parentheses, and this is added at the start:

  • (?:^|[^\\]) - either start of string or any char but \
  • (?:\\{2})* - 0+ occurrences of a double backslash.

Upvotes: 2

David Knipe
David Knipe

Reputation: 3444

This should do it:

"(\\[\s\S]|[^"\\])*"

It's a mixture of the other answers from Wiktor and Taufik.

Upvotes: 0

Taufik Nurrohman
Taufik Nurrohman

Reputation: 3409

Prioritize the escaped characters first:

"(\\.|[^"])*"

https://regex101.com/r/sj4HXw/2

Upvotes: 4

Related Questions