tru7
tru7

Reputation: 7212

regex to split string by single char string but recognizing pairs of brackets and quotes

I am looking for a way to split a string at a specific character but taking into account some elementary syntax, essentially is detecting matching pairs of brackets and quotes and treating them as a single unit. I am not sure if this is possible with regexp, at least not for my level of expertise.

let str="a,pow(3,4),new Value({a:1,b:2}),'{this is a literal, all (this) is a \"single entity'";

let regexp=/what goes here?/;

let arr=str.split(regexp);

expected result:

I hope it's not a duplicate, been unable to find a previous reply

Upvotes: 1

Views: 179

Answers (2)

Jan
Jan

Reputation: 43169

Unfortunately, (*SKIP)(*FAIL) is not supported in JS but you can somewhat mimic it:

  1. Define what you do not want to match and put it in an alternation.
  2. Define what you do want to match and put it in a capture group
  3. Replace the group with sth. that does not occur in your original string
  4. Split by this sequence.


The expression

\([^()]*\)|'[^']*'|(,)

... and the JavaScript code:

var subject = "a,pow(3,4),new Value({a:1,b:2}),'{this is a literal, all (this) is a \"single entity'";

var regex = /\([^()]*\)|'[^']*'|(,)/g;

replaced = subject.replace(regex, function(m, group1) {
    if (typeof group1 == 'undefined') return m;
    else return "SUPERMAN";
});

console.log(replaced.split(/SUPERMAN/));


See a demo for the expression on regex101.com and read @ctwheels' comment - the above snippet won't work for recursive subpatterns.
You could (not saying, you should) use a recursive approach in another language supporting recursive patterns, e.g. PCRE with

(?:(\((?:[^()]*|(?1))*\))|'[^']*')(*SKIP)(*FAIL)|,

This supports nested parentheses as well, see a demo on regex101.com.
Otherwise, you'd need to write a small parser here.

Upvotes: 4

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626758

You are tokenizing a string, so, you may match 1 or more sequences of '...', (...) substrings or any chunks of 1+ chars other than a comma.

Use

/(?:'[^']*'|\([^()]*\)|[^,])+/g

Here is the regex demo.

Details

  • (?:'[^']*'|\([^()]*\)|[^,])+ - 1 or more sequences of:
    • '[^']*' - a ', 0+ chars other than ' and '
    • | - or
    • \([^()]*\) - a ( char, 0+ chars other than ( and ), and then )
    • | - or
    • [^,] - a char other than ,

See the JS demo:

let str="a,pow(3,4),new Value({a:1,b:2}),'{this is a literal, all (this) is a \"single entity'";
let rx = /(?:'[^']*'|\([^()]*\)|[^,])+/g;
console.log(str.match(rx));

Nested parentheses approach:

function splitIt(str) {
    var result = [], start = 0, level = 0, in_par = false, in_quotes = false;
    for (var i = 0; i < str.length; ++i) {
        switch (str[i]) {
            case '(':
                if (!in_quotes) ++level;
                break;
 
            case ')':
                if (level > 0 && !in_quotes)
                    --level;
                break;
            case "'":
                    in_quotes = !in_quotes;
                    break;
 
            case ',':
                if (level || in_quotes || in_par)
                    break;
                if (start < i) {
                    result.push(str.substr(start, i - start));
                }
                start = i + 1;
                break;
        }
    }
 
    if (start < i)
        result.push(str.substr(start, i - start));
   
    return result;
}

var s = "a,pow(3,(4,5)),new Value({a:1,b:2}),'{this is a literal, all (this) is a \"single entity'";
console.log(splitIt(s))

Upvotes: 2

Related Questions