James Donnelly
James Donnelly

Reputation: 128856

Matching items in a comma-delimited list which aren't surrounded by single or double quotes

I'm wanting to match any instance of text in a comma-delimited list. For this, the following regular expression works great:

/[^,]+/g

(Regex101 demo).

The problem is that I'm wanting to ignore any commas which are contained within either single or double quotes and I'm unsure how to extend the above selector to allow me to do that.

Here's an example string:

abcd, efgh, ij"k,l", mnop, 'q,rs't

I'm wanting to either match the five chunks of text or match the four relevant commas (so I can retreive the data using split() instead of match()):

  1. abcd
  2. efgh
  3. ij"k,l"
  4. mnop
  5. 'q,rs't

Or:

abcd, efgh, ij"k,l", mnop, 'q,rs't
    ^     ^        ^     ^

How can I do this?


Three relevant questions exist, but none of them cater for both ' and " in JavaScript:

  1. Regex for splitting a string using space when not surrounded by single or double quotes - Java solution, doesn't appear to work in JavaScript.
  2. A regex to match a comma that isn't surrounded by quotes - Only matches on "
  3. Alternative to regex: match all instances not inside quotes - Only matches on "

Upvotes: 6

Views: 440

Answers (3)

Tim007
Tim007

Reputation: 2557

Try this in JavaScript

(?:(?:[^,"'\n]*(?:(?:"[^"\n]*")|(?:'[^'\n]*'))[^,"'\n]*)+)|[^,\n]+

Demo

Add group for more readable (remove ?<name> for Javascript)

(?<has_quotes>(?:[^,"'\n]*(?:(?<double_quotes>"[^"\n]*")|(?<single_quotes>'[^'\n]*'))[^,"'\n]*)+)|(?<simple>[^,\n]+)

Demo

Explanation:

(?<double_quotes>"[^"\n]*") matches "Any inside but not "" = (1) (in double quote)
(?<single_quotes>'[^'\n]*') matches 'Any inside but not '' = (2) (in single quote)
(?:(?<double_quotes>"[^"\n]*")|(?<single_quotes>'[^'\n]*')) matches (1)or(2) = (3)
[^,"'\n]* matches any text but not "', = (w)
(?:(?:(?<double_quotes>"[^"\n]*")|(?<single_quotes>'[^'\n]*'))[^,"'\n]*) matches (3)(w)
(?:(?:(?<double_quotes>"[^"\n]*")|(?<single_quotes>'[^'\n]*'))[^,"'\n]*)+ matches repeat (3)(w) = (3w+)
(?<has_quotes>[^,"'\n]*(?:(?:(?<double_quotes>"[^"\n]*")|(?<single_quotes>'[^'\n]*'))[^,"'\n]*)+) matches (w)(3w+) = (4) (has quotes)
[^,\n]+ matches other case (5) (simple)
So in final we have (4)|(5) (has quote or simple)

Input

abcd,efgh, ijkl
abcd, efgh, ij"k,l", mnop, 'q,rs't
'q, rs't
"'q,rs't, ij"k, l""

Output:

MATCH 1
simple  [0-4]   `abcd`
MATCH 2
simple  [5-9]   `efgh`
MATCH 3
simple  [10-15] ` ijkl`
MATCH 4
simple  [16-20] `abcd`
MATCH 5
simple  [21-26] ` efgh`
MATCH 6
has_quotes  [27-35] ` ij"k,l"`
double_quotes   [30-35] `"k,l"`
MATCH 7
simple  [36-41] ` mnop`
MATCH 8
has_quotes  [42-50] ` 'q,rs't`
single_quotes   [43-49] `'q,rs'`
MATCH 9
has_quotes  [51-59] `'q, rs't`
single_quotes   [51-58] `'q, rs'`
MATCH 10
has_quotes  [60-74] `"'q,rs't, ij"k`
double_quotes   [60-73] `"'q,rs't, ij"`
MATCH 11
has_quotes  [75-79] ` l""`
double_quotes   [77-79] `""`

Upvotes: 0

Gustav Bertram
Gustav Bertram

Reputation: 14905

Okay, so your matching groups can contain:

  • Just letters
  • A matching pair of "
  • A matching pair of '

So this should work:

/((?:[^,"']+|"[^"]*"|'[^']*')+)/g

RegEx101 Demo

As a nice bonus, you can drop extra single-quotes inside the double-quotes, and vice versa. However, you'll probably need a state machine for adding escaped double-quotes inside double quoted strings (eg. "aa\"aa").

Unfortunately it matches the initial space as well - you'll have to the trim the matches.

Upvotes: 3

anubhava
anubhava

Reputation: 786291

Using a double lookahead to ascertain matched comma is outside quotes:

/(?=(([^"]*"){2})*[^"]*$)(?=(([^']*'){2})*[^']*$)\s*,\s*/g
  • (?=(([^"]*"){2})*[^"]*$) asserts that there are even number of double quotes ahead of matching comma.
  • (?=(([^']*"){2})*[^']*$) does the same assertion for single quote.

PS: This doesn't handle case of unbalanced, nested or escaped quotes.

RegEx Demo

Upvotes: 2

Related Questions