Blue Nebula
Blue Nebula

Reputation: 1174

Split a string with a RegExp that matches each substring. The whole string should be matched fully

I need to split a string. I have a regex able to match each substring entirely.

I tried using it with String.prototype.matchAll() and it's able to split , but that function accepts "invalid tokens" too: pieces of the string that don't match my regex. For instance:

var re = /\s*(\w+|"[^"]*")\s*/g  // matches a word or a quoted string
var str = 'hey ??? "a"b'         // the '???' part is not a valid token
var match = str.matchAll(re)
for(var m of match){
  console.log("Matched:", m[1])
}

Gives me the token hey, "a" and b. Those are indeed the substrings that match my regex, but I would have wanted to get an error in this case, since string contains ??? which is not a valid substring.

How can I do this?

Upvotes: 0

Views: 605

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627535

The /\s*(\w+|"[^"]*")\s*/g regex is used to extract multiple pattern matches from a string, it is not meant to validate a string.

If you need to return true or false, you need a regex for validation that has the following properties:

So, in your case, use the two-step approach:

  • Validate the string with /^\s*(?:(?:\w+|"[^"]*")\s*)*$/.test(text) first and then
  • If there is a match, extract the matches using your code, or a bit more enhanced one, const matches = text.match(/\w+|"[^"]*"/g).

See the JavaScript demo:

var extraction_re = /\w+|"[^"]*"/g;
var validation_re = /^\s*(?:(?:\w+|"[^"]*")\s*)*$/;
for (var text of ['hey "a"b', 'hey ??? "a"b']) {
    if (validation_re.test(text)) {
        console.log("Matched:", text.match(extraction_re))
    } else {
        console.log(text, "=> No Match!")
    }
}

Upvotes: 1

Related Questions