Reputation: 1196
I'm trying to match anything between either double quotes, single quotes, or regex slashes, basically anything that isn't tokenized by javascript as a string or regex. So far what I came up with is:
/"[^\\"\n]*(\\"[^\\"\n]*)*"|'[^\\'\n]*(\\'[^\\'\n]*)*'|\/[^\\\/\n]*(\\\/[^\\\/\n]*)*\//
But there are a couple of problems with this as you can see here
Basically this shouldn't match 1+2/3+4/5
since it isn't a regex. Also
Dont match "Match here\\" Dont match"
should match the first part and not the second (thats true for single quotes and regexes too)
How should this be written?
Edit: If it's not possible differentiate between 1+2/3+4/5
, /*comment*/
and /regex/
using regular expressions, how would I just solve the Dont match "Match here\\" Dont match"
problem
Upvotes: 1
Views: 1209
Reputation: 215049
The trick to match c-alike escaped strings is like this:
" (\\. | [^"]) * "
That is,
- quote
- repeat (
- one escaped char
- or not a quote
)
- quote
Similarly with single quotes. Illustration in python since JS regexes are ugly:
import re
test = r"""
foo "bar" and "bar\"bar" and "bar\\bar" and "bar \\"
foo 'bar' and 'bar\'bar' and 'bar\\bar' and 'bar \\'
"""
rr = r"""(?x)
" (\\. | [^"]) * "
|
' (\\. | [^']) * '
"""
print re.sub(rr, '@@', test)
> foo @@ and @@ and @@ and @@
> foo @@ and @@ and @@ and @@
It might be necessary to add newlines to the [^"]
group.
Do note that this expression is quite forgiving and allows many constructs that aren't valid javascript. See https://stackoverflow.com/a/13800082/989121 for the complete and accurate implementation.
Upvotes: 0
Reputation: 1196
Just figured it out. I was very close. Here's the solution:
/"[^\\"\n]*(\\["\\][^\\"\n]*)*"|'[^\\'\n]*(\\['\\][^\\'\n]*)*'|\/[^\\\/\n]*(\\[\/\\][^\\\/\n]*)*\//
It's very similar to thg435 answer but I think it's a little more performent because it doesn't backtrack as much
What I was missing was when looking for an escaped quote, I should have also been looking for an escaped backslash too, so i changed \\"
to \\["\\]
As opposed to thg435's answer which looks at anything after a backslash which while valid can use up more states in the regex engine
Upvotes: 0