MosheK
MosheK

Reputation: 1196

javascript regex to match anything between single quotes, double quotes and regex slashes

I'm trying to match anything between either double quotes, single quotes, or regex slashes, basically anything that isn't tokenized by javascript as a string or regex. So far what I came up with is:

/"[^\\"\n]*(\\"[^\\"\n]*)*"|'[^\\'\n]*(\\'[^\\'\n]*)*'|\/[^\\\/\n]*(\\\/[^\\\/\n]*)*\//

But there are a couple of problems with this as you can see here

http://goo.gl/4Yn9pR

Basically this shouldn't match 1+2/3+4/5 since it isn't a regex. Also
Dont match "Match here\\" Dont match" should match the first part and not the second (thats true for single quotes and regexes too)

How should this be written?

Edit: If it's not possible differentiate between 1+2/3+4/5, /*comment*/ and /regex/ using regular expressions, how would I just solve the Dont match "Match here\\" Dont match" problem

Upvotes: 1

Views: 1209

Answers (2)

georg
georg

Reputation: 215049

The trick to match c-alike escaped strings is like this:

" (\\. | [^"]) * "

That is,

 - quote
 - repeat (
    - one escaped char
    - or not a quote
   )
  - quote

Similarly with single quotes. Illustration in python since JS regexes are ugly:

import re

test = r"""
    foo "bar" and "bar\"bar" and "bar\\bar" and "bar \\"
    foo 'bar' and 'bar\'bar' and 'bar\\bar' and 'bar \\'
"""

rr = r"""(?x)
    " (\\. | [^"]) * "
    |
    ' (\\. | [^']) * '
"""

print re.sub(rr, '@@', test)

> foo @@ and @@ and @@ and @@
> foo @@ and @@ and @@ and @@

It might be necessary to add newlines to the [^"] group.

Do note that this expression is quite forgiving and allows many constructs that aren't valid javascript. See https://stackoverflow.com/a/13800082/989121 for the complete and accurate implementation.

Upvotes: 0

MosheK
MosheK

Reputation: 1196

Just figured it out. I was very close. Here's the solution:

/"[^\\"\n]*(\\["\\][^\\"\n]*)*"|'[^\\'\n]*(\\['\\][^\\'\n]*)*'|\/[^\\\/\n]*(\\[\/\\][^\\\/\n]*)*\//

DEMO

It's very similar to thg435 answer but I think it's a little more performent because it doesn't backtrack as much

What I was missing was when looking for an escaped quote, I should have also been looking for an escaped backslash too, so i changed \\" to \\["\\] As opposed to thg435's answer which looks at anything after a backslash which while valid can use up more states in the regex engine

Upvotes: 0

Related Questions