Reputation: 32951
I need to find a substring surrounded by double quotes, for example, like "test"
, "te\"st"
or ""
, but not """
neither "\". To achieve this, which is the best way to go for it in the following
1) /".*"/g
2) /"[^"\\]*(?:\\[\S\s][^"\\]*)*"/g
3) /"(?:\\?[\S\s])*?"/g
4) /"([^"\\]*("|\\[\S\s]))+/g
I was asked this question yesterday during an interview, and would like to know the answer for future reference.
Upvotes: 8
Views: 1746
Reputation: 413
Your grammar is a little unclear. I will assume that you want to find all strings of the form DQ [anything but DQ or \DQ]* DQ.
The regex for this /"([^"\\\\]|\\\\"|\\\\[^"])*"/g
Upvotes: 0
Reputation: 37997
You could also get away with this simpler guy:
/("(\\"|[^"])+")/g
http://jsfiddle.net/b9chris/eMN2S/
Upvotes: 0
Reputation: 4356
These expressions evaluate as follows:
Expression 1 matches:
This would match "test" some wrong text "text"
, and therefore fails
Expression 2 matches:
So this collects all chracters within the inverted commas in sets, broken by slashes. It specifically excludes an inverted comma if it is preceded by a slash by including it in any subsequent sets. This will work.
Expression 3 matches:
This collects all characters , optionally preceded by a slash, but not greedily. This will work
Expression 4 matches:
This will match "test"\x
, and therefore fails
Conclusion:
From what I can tell, both expressions 2 and 3 will work. I may have missed something, but both will certainly work (or not as appropriate) for the examples given. So the question, then, is which is better. I'd vote for three, because it's simpler.
Upvotes: 2