Reputation: 128856
I'm wanting to match any instance of text in a comma-delimited list. For this, the following regular expression works great:
/[^,]+/g
The problem is that I'm wanting to ignore any commas which are contained within either single or double quotes and I'm unsure how to extend the above selector to allow me to do that.
Here's an example string:
abcd, efgh, ij"k,l", mnop, 'q,rs't
I'm wanting to either match the five chunks of text or match the four relevant commas (so I can retreive the data using split()
instead of match()
):
abcd
efgh
ij"k,l"
mnop
'q,rs't
Or:
abcd, efgh, ij"k,l", mnop, 'q,rs't
^ ^ ^ ^
How can I do this?
Three relevant questions exist, but none of them cater for both '
and "
in JavaScript:
"
"
Upvotes: 6
Views: 440
Reputation: 2557
Try this in JavaScript
(?:(?:[^,"'\n]*(?:(?:"[^"\n]*")|(?:'[^'\n]*'))[^,"'\n]*)+)|[^,\n]+
Add group for more readable (remove ?<name> for Javascript)
(?<has_quotes>(?:[^,"'\n]*(?:(?<double_quotes>"[^"\n]*")|(?<single_quotes>'[^'\n]*'))[^,"'\n]*)+)|(?<simple>[^,\n]+)
Explanation:
(?<double_quotes>"[^"\n]*")
matches "
Any inside but not ""
= (1) (in double quote)
(?<single_quotes>'[^'\n]*')
matches '
Any inside but not ''
= (2) (in single quote)
(?:(?<double_quotes>"[^"\n]*")|(?<single_quotes>'[^'\n]*'))
matches (1)or(2) = (3)
[^,"'\n]*
matches any text but not "',
= (w)
(?:(?:(?<double_quotes>"[^"\n]*")|(?<single_quotes>'[^'\n]*'))[^,"'\n]*)
matches (3)(w)
(?:(?:(?<double_quotes>"[^"\n]*")|(?<single_quotes>'[^'\n]*'))[^,"'\n]*)+
matches repeat (3)(w) = (3w+)
(?<has_quotes>[^,"'\n]*(?:(?:(?<double_quotes>"[^"\n]*")|(?<single_quotes>'[^'\n]*'))[^,"'\n]*)+)
matches (w)(3w+) = (4) (has quotes)
[^,\n]+
matches other case (5) (simple)
So in final we have (4)|(5) (has quote or simple)
Input
abcd,efgh, ijkl
abcd, efgh, ij"k,l", mnop, 'q,rs't
'q, rs't
"'q,rs't, ij"k, l""
Output:
MATCH 1
simple [0-4] `abcd`
MATCH 2
simple [5-9] `efgh`
MATCH 3
simple [10-15] ` ijkl`
MATCH 4
simple [16-20] `abcd`
MATCH 5
simple [21-26] ` efgh`
MATCH 6
has_quotes [27-35] ` ij"k,l"`
double_quotes [30-35] `"k,l"`
MATCH 7
simple [36-41] ` mnop`
MATCH 8
has_quotes [42-50] ` 'q,rs't`
single_quotes [43-49] `'q,rs'`
MATCH 9
has_quotes [51-59] `'q, rs't`
single_quotes [51-58] `'q, rs'`
MATCH 10
has_quotes [60-74] `"'q,rs't, ij"k`
double_quotes [60-73] `"'q,rs't, ij"`
MATCH 11
has_quotes [75-79] ` l""`
double_quotes [77-79] `""`
Upvotes: 0
Reputation: 14905
Okay, so your matching groups can contain:
So this should work:
/((?:[^,"']+|"[^"]*"|'[^']*')+)/g
As a nice bonus, you can drop extra single-quotes inside the double-quotes, and vice versa. However, you'll probably need a state machine for adding escaped double-quotes inside double quoted strings (eg. "aa\"aa").
Unfortunately it matches the initial space as well - you'll have to the trim the matches.
Upvotes: 3
Reputation: 786291
Using a double lookahead to ascertain matched comma is outside quotes:
/(?=(([^"]*"){2})*[^"]*$)(?=(([^']*'){2})*[^']*$)\s*,\s*/g
(?=(([^"]*"){2})*[^"]*$)
asserts that there are even number of double quotes ahead of matching comma. (?=(([^']*"){2})*[^']*$)
does the same assertion for single quote.PS: This doesn't handle case of unbalanced, nested or escaped quotes.
Upvotes: 2