Maverick
Maverick

Reputation: 886

Regex multi word boundry (exact word)

I am looking for a way to match the exact words entered in Regex.

Unfortunately, boundary won't work because the search term can have multiple words.

I came up with this regex (?:^|[\\s])(<word>)(?:$|[\\s!?]) and it works perfectly until there are multiple <word>s one to another.

Example:

Regex: (?:^|[\\s])(won)(?:$|[\\s!?])

Text:

We won won won

In this text, it will only match every second word. I get this is because of requires a space but that space is already included with the previous word.

There are more difficulties with this.

It shouldn't match contractions, such as won shouldn't match won't. This also applies for hyphenated words won-me.

To make this simple I made unit tests for testing all the cases:

https://regex101.com/r/9Mj0UC/4/tests

Note: I can't test in unit tests if it matches every single one or every second one. Therefore please simple look at test string panel.

Can someone provide a solution for this Regex madness?

It needs to be written in Regex (and JS compatible)

Upvotes: 0

Views: 170

Answers (4)

vsemozhebuty
vsemozhebuty

Reputation: 13772

What about this way (without lookbehind):

/(?:^|(?!['-])[^]\b)won(?!\B|['-])/i
  1. Start of the line or any symbol except ' or - before word boundary.
  2. The word.
  3. Lookahead assertion negating not word boundary or ' or -. (This one does not capture the spaces so repeated words are captured.)

Upvotes: 1

Jan
Jan

Reputation: 43169

You could use the following expression:

(\w+-)?won(?![-'])

Additionally, you need to check if the first group is empty programmatically, see a demo on regex101.com.

For engines supporting lookbehinds (Chrome and the like), you could even use

(?<!\w-)won(?![-'])

See a demo on regex101.com as well.


The first could be done in JS like so:
let strings = ["I won't win", "won", "I won", "You won", "We won, finally", "Have we won?", "We won!", "We non-won match", "He won-me"];

let rx = /(\w+-)?won(?![-'])/
strings.forEach(function(item) {
    m = rx.exec(item);
    if ((m != null) && (typeof(m[1]) == 'undefined'))
        console.log(item);
});

Upvotes: 1

Swaroop Deval
Swaroop Deval

Reputation: 906

Use positive lookbehind and positive lookahead for spaces. below is the regex.

//check if there is are white spaces before and after the word

let regex = /(?<=\s)won(?=\s)/g;

console.log("We won won won't won no-won".match(regex));

Upvotes: 0

FZs
FZs

Reputation: 18619

Simply use \b to match a word boundary:

console.log("We won won won no-won won-with-hyphen".match(/(?<!-)\b(won)\b(?!-)/g))

Regex101.com example

Upvotes: 0

Related Questions