Om3ga
Om3ga

Reputation: 32951

JS regular expression to find a substring surrounded by double quotes

I need to find a substring surrounded by double quotes, for example, like "test", "te\"st" or "", but not """ neither "\". To achieve this, which is the best way to go for it in the following

1) /".*"/g
2) /"[^"\\]*(?:\\[\S\s][^"\\]*)*"/g
3) /"(?:\\?[\S\s])*?"/g
4) /"([^"\\]*("|\\[\S\s]))+/g

I was asked this question yesterday during an interview, and would like to know the answer for future reference.

Upvotes: 8

Views: 1746

Answers (3)

cacba
cacba

Reputation: 413

Your grammar is a little unclear. I will assume that you want to find all strings of the form DQ [anything but DQ or \DQ]* DQ.

The regex for this /"([^"\\\\]|\\\\"|\\\\[^"])*"/g

Upvotes: 0

Chris Moschini
Chris Moschini

Reputation: 37997

You could also get away with this simpler guy:

/("(\\"|[^"])+")/g

http://jsfiddle.net/b9chris/eMN2S/

Upvotes: 0

Gareth Cornish
Gareth Cornish

Reputation: 4356

These expressions evaluate as follows:

Expression 1 matches:

  • An inverted comma
  • Greedily any character, including an inverted comma or a slash
  • A final inverted comma.

This would match "test" some wrong text "text", and therefore fails

Expression 2 matches:

  • An inverted comma
  • Greedily as many characters that are not either an inverted comma or a slash
  • Greedily as many sets of
    • Any chracter preceded by a slash
    • Greedily as many characters that are not either an inverted comma or a slash
  • A final inverted comma

So this collects all chracters within the inverted commas in sets, broken by slashes. It specifically excludes an inverted comma if it is preceded by a slash by including it in any subsequent sets. This will work.

Expression 3 matches:

  • An inverted comma
  • As few sets as fit of:
    • Any one character preceded by an optional slash
  • A final inverted comma

This collects all characters , optionally preceded by a slash, but not greedily. This will work

Expression 4 matches:

  • An inverted comma
  • Greedily all characters that are no either an inverted comma or a slash
  • One or more of:
    • An inverted comma or
    • A slash and any character

This will match "test"\x, and therefore fails

Conclusion:

From what I can tell, both expressions 2 and 3 will work. I may have missed something, but both will certainly work (or not as appropriate) for the examples given. So the question, then, is which is better. I'd vote for three, because it's simpler.

Upvotes: 2

Related Questions