MarkMark
MarkMark

Reputation: 184

Split string with regex skipping brackets []

I have a string and need to split it by whitespace but if there would be some words inside brackets I need to skip it.

For example,

input: 'tree car[tesla BMW] cat color[yellow blue] dog'

output: ['tree', 'car[tesla BMW]', 'cat', 'color[yellow blue]', 'dog']

if I use simple .split(' ') it would go inside brackets and return an incorrect result.

Also, I've tried to write a regex, but unsuccessfully :(

My last regex looks like this .split(/(?:(?<=\[).+?(?=\])| )+/) and return ["tree", "car[", "]", "cat", "color[", "]", "dog"]

Would be really grateful for any help

Upvotes: 5

Views: 1374

Answers (4)

The fourth bird
The fourth bird

Reputation: 163277

You could split on a space asserting to the right 1 or more non whitespace chars except for square brackets and optionally match from an opening till closing square bracket followed by a whitespace boundary at the right.

[ ](?=[^\][\s]+(?:\[[^\][]*])?(?!\S))

Explanation

  • [ ] Match a space (square brackets only for clarity)
  • (?= Postive lookahead
    • [^\][\s]+ Match 1+ times any char except ] [ or a whitespace char
    • (?:\[[^\][]*])? Optinally match from [...]
    • (?!\S) A whitespace boundary to the right
  • ) Close lookahead

Regex demo

const regex = / (?=[^\][\s]+(?:\[[^\][]*])?(?!\S))/g;
[
  "tree car[tesla BMW] cat color[yellow blue] dog",
  "tree car[tesla BMW] cat xml:cat xml:color[yellow blue] dog",
  "tree:test car[tesla BMW]",
  "tree car[tesla BMW] cat color yellow blue] dog",
  "tree car[tesla BMW] cat color[yellow blue dog"
].forEach(s => console.log(s.split(regex)));

Upvotes: 3

georg
georg

Reputation: 214949

This is easier with match:

input = 'tree car[tesla BMW] cat xml:cat xml:color[yellow blue] dog'

output = input.match(/[^[\]\s]+(\[.+?\])?/g)

console.log(output)

With split you need a lookahead like this:

input = 'tree car[tesla BMW] cat color[yellow blue] dog'

output = input.split(/ (?![^[]*\])/)

console.log(output)

Both snippets only work if brackets are not nested, otherwise you'd need a parser rather than a regexp.

Upvotes: 5

Tim Biegeleisen
Tim Biegeleisen

Reputation: 521093

Here is one regex find all option:

var input = 'tree car[tesla BMW] cat color[yellow blue] dog';
var matches = input.match(/\[.*?\]|[ ]|\b\w+\b/g);
var output = [];
var idx1 = 0;
var idx2 = 0;

do {
    if (matches[idx1] === " ") {
        ++idx1;
        continue;
    }

    do {
        output[idx2] = output[idx2] ? output[idx2] + matches[idx1] : matches[idx1];
        ++idx1;
    } while(matches[idx1] != " " && idx1 < matches.length);
    ++idx2;
} while(idx1 < matches.length);

console.log(output);

For an explanation of the regex, we deal with the [...] terms which might have spaces by eagerly trying to match them first. Next, we look for space separators, and finally we look for standalone words. Here is the regex:

\[.*?\]   find a [...] term
|         OR
[ ]       find a space
|         OR
\b\w+\b   find a word

This gives us the following intermediate array:

["tree", " ", "car", "[tesla BMW]", " ", "cat", " ", "color", "[yellow blue]", " ", "dog"]

Then we iterate and join together all non space entries in an output array, using the actual spaces to indicate where the real separations should be happening.

Upvotes: 1

Fe Qlw
Fe Qlw

Reputation: 39

If you insist to use regex I recommend you to watch this page. The writer split by comma but I believe you smart enough to change it to space

Upvotes: 0

Related Questions