Reputation: 184
I have a string and need to split it by whitespace but if there would be some words inside brackets I need to skip it.
For example,
input: 'tree car[tesla BMW] cat color[yellow blue] dog'
output: ['tree', 'car[tesla BMW]', 'cat', 'color[yellow blue]', 'dog']
if I use simple .split(' ')
it would go inside brackets and return an incorrect result.
Also, I've tried to write a regex, but unsuccessfully :(
My last regex looks like this .split(/(?:(?<=\[).+?(?=\])| )+/)
and return ["tree", "car[", "]", "cat", "color[", "]", "dog"]
Would be really grateful for any help
Upvotes: 5
Views: 1374
Reputation: 163277
You could split on a space asserting to the right 1 or more non whitespace chars except for square brackets and optionally match from an opening till closing square bracket followed by a whitespace boundary at the right.
[ ](?=[^\][\s]+(?:\[[^\][]*])?(?!\S))
Explanation
[ ]
Match a space (square brackets only for clarity)(?=
Postive lookahead
[^\][\s]+
Match 1+ times any char except ]
[
or a whitespace char(?:\[[^\][]*])?
Optinally match from [...]
(?!\S)
A whitespace boundary to the right)
Close lookaheadconst regex = / (?=[^\][\s]+(?:\[[^\][]*])?(?!\S))/g;
[
"tree car[tesla BMW] cat color[yellow blue] dog",
"tree car[tesla BMW] cat xml:cat xml:color[yellow blue] dog",
"tree:test car[tesla BMW]",
"tree car[tesla BMW] cat color yellow blue] dog",
"tree car[tesla BMW] cat color[yellow blue dog"
].forEach(s => console.log(s.split(regex)));
Upvotes: 3
Reputation: 214949
This is easier with match
:
input = 'tree car[tesla BMW] cat xml:cat xml:color[yellow blue] dog'
output = input.match(/[^[\]\s]+(\[.+?\])?/g)
console.log(output)
With split
you need a lookahead like this:
input = 'tree car[tesla BMW] cat color[yellow blue] dog'
output = input.split(/ (?![^[]*\])/)
console.log(output)
Both snippets only work if brackets are not nested, otherwise you'd need a parser rather than a regexp.
Upvotes: 5
Reputation: 521093
Here is one regex find all option:
var input = 'tree car[tesla BMW] cat color[yellow blue] dog';
var matches = input.match(/\[.*?\]|[ ]|\b\w+\b/g);
var output = [];
var idx1 = 0;
var idx2 = 0;
do {
if (matches[idx1] === " ") {
++idx1;
continue;
}
do {
output[idx2] = output[idx2] ? output[idx2] + matches[idx1] : matches[idx1];
++idx1;
} while(matches[idx1] != " " && idx1 < matches.length);
++idx2;
} while(idx1 < matches.length);
console.log(output);
For an explanation of the regex, we deal with the [...]
terms which might have spaces by eagerly trying to match them first. Next, we look for space separators, and finally we look for standalone words. Here is the regex:
\[.*?\] find a [...] term
| OR
[ ] find a space
| OR
\b\w+\b find a word
This gives us the following intermediate array:
["tree", " ", "car", "[tesla BMW]", " ", "cat", " ", "color", "[yellow blue]", " ", "dog"]
Then we iterate and join together all non space entries in an output array, using the actual spaces to indicate where the real separations should be happening.
Upvotes: 1