AlvinfromDiaspar
AlvinfromDiaspar

Reputation: 6834

Javascript Regex - ignoring certain characters between 2 chars

I have a need to split a string on space character (' ') but while excluding any spaces that come within 2 specific characters (say single quotes).

Here is an example string:

This-is-first-token This-is-second-token 'This is third token'

The output array should look like this:

[0] = This-is-first-token
[1] = This-is-second-token
[2] = 'This is third token'

Question: Can this be done elegantly with regular expression?

Upvotes: 5

Views: 16233

Answers (3)

elixenide
elixenide

Reputation: 44851

Short Answer:

A simple regex for this purpose would be:

/'[^']+'|[^\s]+/g

Sample code:

data = "This-is-first-token This-is-second-token 'This is third token'";
data.match(/'[^']+'|[^\s]+/g);

Result:

["This-is-first-token", "This-is-second-token", "'This is third token'"]

Explanation:

Regular expression visualization

Debuggex Demo

I think this is as simple as you can make it in just a regex.

The g at the end makes it a global match, so you get all three matches. Without it, you get only the first string.

\s matches all whitespace (basically, and tabs, in this instance). So, it would work even if there was a tab between This-is-first-token and This-is-second-token.

To match content in braces, use this:

data.match(/\{[^\}]+\}|[^\s]+/g);

Regular expression visualization

Debuggex Demo

Braces or single quotes:

data.match(/\{[^\}]+\}|'[^']+'|[^\s]+/g);

Regular expression visualization

Debuggex Demo

Upvotes: 12

Rob M.
Rob M.

Reputation: 36541

I came up with the following:

"This-is-first-token This-is-second-token 'This is third token'".match(/('[A-Za-z\s^-]+'|[A-Za-z\-]+)/g)
["This-is-first-token", "This-is-second-token", "'This is third token'"]

Upvotes: 1

anubhava
anubhava

Reputation: 786291

You can use this split:

var string = "This-is-first-token This-is-second-token 'This is third token'";
var arr = string.split(/(?=(?:(?:[^']*'){2})*[^']*$)\s+/);
//=> ["This-is-first-token", "This-is-second-token", "'This is third token'"]

This assumes quotes are all balanced.

Upvotes: 3

Related Questions