user2728841
user2728841

Reputation: 1427

Javascript Regex to Split a String by Word Boundary when not in quotes

I've got an expression held in a JS string and I want to split it into tokens. The string could contain any symbols or characters (its actually a string expression)

I've been using

expr.split(/([^\"]\S*|\".+?\")\s*/)

But when I get a text symbol outside of quotes it splits it wrongly.

e.g. When

expr = "Tree = \"\" Or Tree = \"hello cruel world\" + \" and xyz\""

Then The OR gets mixed in with the following string.

Splitting on \b seems to be the way to go (is it?) but I don't know how to keep the strings in quotes together. So ideally in the above I'd get:

Tree
=
\"\"
Or
Tree
=
\"Hello cruel world\"
+
\" and xyz\"

I suppose ideally I would find a tokenizer but if I could do it in regex that would be a major headache solved :)

thanks

Upvotes: 0

Views: 2335

Answers (1)

Josh Crozier
Josh Crozier

Reputation: 240938

A simpler approach is to use .match() instead of .split() and match the characters between the quotes or groups of non-whitespace characters using an alternation:

/"[^"]+"|\S+/g

Explanation:

  • "[^"]+" - Match one or more non-" characters between the double quotes..
  • | - Alternation
  • \S+ - ...or match groups of one or more non-whitespace characters

Usage:

var string = 'Tree = \"\" Or Tree = \"hello cruel world\" + \" and xyz\"';
var result = string.match(/"[^"]+"|\S+/g);

document.querySelector('pre').textContent = JSON.stringify(result, null, 4);
<pre></pre>

Upvotes: 1

Related Questions