braks
braks

Reputation: 1610

JavaScript Regex with semi-colon and whitespace

This seems a bit hairy at first blush, so I hope someone can give this a once over.

The intention is to split the string into an array of substrings, the characters to split before or after are retained as part of the substring components (i.e. nothing is lost as it would be in a typical split). The splits should occur right after chars defined in 'endsWith' and right before chars defined in 'startsWith'.

Originally I wrote it with the 'endsWith' functionality, and it worked fine (as demonstrated further in the post), but when I added the 'startsWith' feature things started to get a bit hinky.

var input = "foo bar;baz#qux>quux,rawr";

var startsWith = ['#', ','];
var endsWith = [';', '\\s', '>'];

var re = new RegExp("(?=[" + startsWith.join('') + "])(.*?[" + endsWith.join('') + "]+)", "g");
console.log(re); //=> /(?=[#,])(.*?[;\s>]+)/g

var result = input.split(re).filter(Boolean);
console.log(result);

Result: [ 'foo bar;baz', '#qux>', 'quux,rawr' ]

Expected: [ 'foo ', 'bar;', 'baz', '#qux>', 'quux', ',rawr' ]

The problem is that it's not splitting after whitespace or semi-colons, curiously though it is splitting after the greater-than symbol.

(After adding a second char to startsWith it is clear that it is not splitting on the common - no matter the order of '#' and ',' in the regex)

Another interesting thing is that removing the 'startsWith' stuff and just making it:

    var re = new RegExp("(.*?[" + endsWith.join('') + "]+)", "g");
    console.log(re); //=> /(.*?[;\s>]+)/g

The semi-colons and whitespaces now work: [ 'foo ', 'bar;', 'baz#qux>', 'quux,rawr' ]

But I also want the startsWith functionality (having '#qux' and ',rawr' separated), and I don't understand why I'm seeing that issue when that's added back in.

Upvotes: 0

Views: 993

Answers (3)

Sagar V
Sagar V

Reputation: 12478

Now check it

var input = "abc&foo bar;baz#qux>quux,awrr";
    var re = /([#,]?[^#;>\s,]*[\;\s\>]?){1}/g
    console.log(re); 

    var result = input.split(re).filter(Boolean);
    console.log(result);

Upvotes: 1

anubhava
anubhava

Reputation: 786021

Define your re object:

var re = new RegExp("([" + startsWith.join('') + "]+.*?[" + endsWith.join('') + 
         "]+)|[" + endsWith.join('') + "]+");
//=> /(#.*?[;\s>])|[;\s>]+/
  1. It uses a captured group between # to one of the ending characters so that split returns same captured text in result
  2. It uses alternation to allow for splitting on one of the given characters defined by endsWith array.

Then use it as:

var result = input.split(re).filter(Boolean);
//=> ["foo", "bar", "baz", "#qux>", "quux"]

Upvotes: 0

Maciej Kozieja
Maciej Kozieja

Reputation: 1865

I thik this should work:

const splitChars = [' ', ';', '#', '>']
const regex = new RegExp(`(.*?(?:${splitChars.join('|')}))`)
let str = "foo bar;baz#qux>quux"

const array = str.split(regex).filter(x => x != "")
console.log(array)

Upvotes: 0

Related Questions