Rajashree
Rajashree

Reputation: 61

RegExp Boundary is not considering letters after Special Characters

Requirement:

I need to get the searched word in a sentence. So am using RegExp Word Boundary for that.

Note: I need to match WHOLE WORD.

The issue am facing:

When I use RegExp Word Boundary to search a word in a sentence, it's not considering the letters after special character. For example, the below string has only 1 Greek but the RegExp is saying that it has 2.

"The particularly mysteries, which honored the Greek's goddess Demeter Greek."

Code Snippet:

word: string = "Greek";
sentence: string = "The particularly mysteries, which honored the Greek's goddess Demeter Greek.";
isWordThere: boolean = false;
searchedValue: any = [];

constructor() {
    const regex = new RegExp('\\b' + this.word + '\\b', 'g');
    this.isWordThere = regex.test(this.sentence);
    this.searchedValue = this.sentence.match(regex);
    console.log(this.searchedValue);
}

What changes I can do to match the whole word? or what else I can do to achieve the requirement?

Upvotes: 1

Views: 137

Answers (4)

Ryszard Czech
Ryszard Czech

Reputation: 18611

Use a shorter

RegExp("(?<![-'\\p{L}])" + this.word + "(?![-'\\p{L}0-9])", 'gu');

See regex proof.

EXPLANATION

--------------------------------------------------------------------------------
  (?<!                     look behind to see if there is not:
--------------------------------------------------------------------------------
    [-'\p{L}0-9]           any character of: '-', ''', a Unicode letter, digits
--------------------------------------------------------------------------------
  )                        end of look-behind
--------------------------------------------------------------------------------
foo                        a word
--------------------------------------------------------------------------------
  (?!                      look ahead to see if there is not:
--------------------------------------------------------------------------------
    [-'\p{L}0-9]            any character of: '-', ''', a Unicode letter, digits
--------------------------------------------------------------------------------
  )                        end of look-ahead

Upvotes: 0

Rajashree
Rajashree

Reputation: 61

With few modifications to @chris-maurer's answer, Am able to achieve my requirement and the correct RegExp is as shown below.

RegExp("(?<![-'0-9a-zÀ-ÿœēčŭ])" + this.word + "(?![-'0-9a-zÀ-ÿœēčŭ])", 'g');

Upvotes: 1

Chris Maurer
Chris Maurer

Reputation: 2547

Instead of \b which is a special kind of non-capturing group, you will want a general negative lookahead (and you might as well include the same thing in a negative look behind)

    RegExp('(?<![a-z''])' + this.word + '(?![a-z'']', 'gi')

This assumes your 'words' are only alphas. I also changed it to ignore case and match either Greek or greek. This will not match Greek's or Greeks or fenugreek. It will match Greek and greek-specific. If you change the example to search for "all" in the sentence "Y'all should not take all the cookies." it won't match the Y'all but will match the all.

Upvotes: 1

Pakoco
Pakoco

Reputation: 31

I would do something like see [']?[a-zA-Z]+ to your RegEx where ['] are the language special characters...

RegExp('\\b' + this.word + '[\']?[a-zA-Z]+\\b', 'g');

Upvotes: 0

Related Questions