thomasf
thomasf

Reputation: 53

Regex to match all words except those in parentheses - javascript

I'm using the following regex to match all words:

mystr.replace(/([^\W_]+[^\s-]*) */g, function (match, p1, index, title) {...}

Note that words can contain special characters like German Umlauts. How can I match all words excluding those inside parentheses?

If I have the following string:

here wäre c'è (don't match this one) match this

I would like to get the following output:

here
wäre
c'è
match
this

The trailing spaces don't really matter. Is there an easy way to achieve this with regex in javascript?

EDIT: I cannot remove the text in parentheses, as the final string "mystr" should also contain this text, whereas string operations will be performed on text that matches. The final string contained in "mystr" could look like this:

Here Wäre C'è (don't match this one) Match This

Upvotes: 4

Views: 8321

Answers (2)

zx81
zx81

Reputation: 41848

Thomas, resurrecting this question because it had a simple solution that wasn't mentioned and that doesn't require replacing then matching (one step instead of two steps). (Found your question while doing some research for a general question about how to exclude patterns in regex.)

Here's our simple regex (see it at work on regex101, looking at the Group captures in the bottom right panel):

\(.*?\)|([^\W_]+[^\s-]*)

The left side of the alternation matches complete (parenthesized phrases). We will ignore these matches. The right side matches and captures words to Group 1, and we know they are the right words because they were not matched by the expression on the left.

This program shows how to use the regex (see the matches in the online demo):

<script>
var subject = 'here wäre c\'è (don\'t match this one) match this';
var regex = /\(.*?\)|([^\W_]+[^\s-]*)/g;
var group1Caps = [];
var match = regex.exec(subject);

// put Group 1 captures in an array
while (match != null) {
    if( match[1] != null ) group1Caps.push(match[1]);
    match = regex.exec(subject);
}

document.write("<br>*** Matches ***<br>");
if (group1Caps.length > 0) {
   for (key in group1Caps) document.write(group1Caps[key],"<br>");
   }

</script>

Reference

How to match (or replace) a pattern except in situations s1, s2, s3...

Upvotes: 2

Fabrizio Calderan
Fabrizio Calderan

Reputation: 123428

Try this:

var str = "here wäre c'è (don't match this one) match this";

str.replace(/\([^\)]*\)/g, '')  // remove text inside parens (& parens)
   .match(/(\S+)/g);            // match remaining text

// ["here", "wäre", "c'è", "match", "this"]

Upvotes: 4

Related Questions