Reputation: 1610
This seems a bit hairy at first blush, so I hope someone can give this a once over.
The intention is to split the string into an array of substrings, the characters to split before or after are retained as part of the substring components (i.e. nothing is lost as it would be in a typical split). The splits should occur right after chars defined in 'endsWith' and right before chars defined in 'startsWith'.
Originally I wrote it with the 'endsWith' functionality, and it worked fine (as demonstrated further in the post), but when I added the 'startsWith' feature things started to get a bit hinky.
var input = "foo bar;baz#qux>quux,rawr";
var startsWith = ['#', ','];
var endsWith = [';', '\\s', '>'];
var re = new RegExp("(?=[" + startsWith.join('') + "])(.*?[" + endsWith.join('') + "]+)", "g");
console.log(re); //=> /(?=[#,])(.*?[;\s>]+)/g
var result = input.split(re).filter(Boolean);
console.log(result);
Result: [ 'foo bar;baz', '#qux>', 'quux,rawr' ]
Expected: [ 'foo ', 'bar;', 'baz', '#qux>', 'quux', ',rawr' ]
The problem is that it's not splitting after whitespace or semi-colons, curiously though it is splitting after the greater-than symbol.
(After adding a second char to startsWith it is clear that it is not splitting on the common - no matter the order of '#' and ',' in the regex)
Another interesting thing is that removing the 'startsWith' stuff and just making it:
var re = new RegExp("(.*?[" + endsWith.join('') + "]+)", "g");
console.log(re); //=> /(.*?[;\s>]+)/g
The semi-colons and whitespaces now work: [ 'foo ', 'bar;', 'baz#qux>', 'quux,rawr' ]
But I also want the startsWith functionality (having '#qux' and ',rawr' separated), and I don't understand why I'm seeing that issue when that's added back in.
Upvotes: 0
Views: 993
Reputation: 12478
Now check it
var input = "abc&foo bar;baz#qux>quux,awrr";
var re = /([#,]?[^#;>\s,]*[\;\s\>]?){1}/g
console.log(re);
var result = input.split(re).filter(Boolean);
console.log(result);
Upvotes: 1
Reputation: 786021
Define your re
object:
var re = new RegExp("([" + startsWith.join('') + "]+.*?[" + endsWith.join('') +
"]+)|[" + endsWith.join('') + "]+");
//=> /(#.*?[;\s>])|[;\s>]+/
#
to one of the ending characters so that split
returns same captured text in resultendsWith
array.Then use it as:
var result = input.split(re).filter(Boolean);
//=> ["foo", "bar", "baz", "#qux>", "quux"]
Upvotes: 0
Reputation: 1865
I thik this should work:
const splitChars = [' ', ';', '#', '>']
const regex = new RegExp(`(.*?(?:${splitChars.join('|')}))`)
let str = "foo bar;baz#qux>quux"
const array = str.split(regex).filter(x => x != "")
console.log(array)
Upvotes: 0