Yashwardhan Pauranik
Yashwardhan Pauranik

Reputation: 5566

Regex is accepting words with symbols

I have a situation where I need to test the string if it contains a particular word or letter using the Javascript Regex.

Sample strings would be:

// In the first 3 strings, I need "C" letter to be checked in the string
C is language is required.     
We need a C language dev.
Looking for a dev who knows C!

// Keyword is Artificial Intelligence
We need looking for someone who knows Artificial Intelligence.

For checking the above I have created a Regex.

['C', 'Artificial Intelligence', 'D', 'Angular', 'JS'].forEach((item) => {
 const baseRex = /[!,.?": ]?/g;
 const finalRex = new RegExp(baseRex.source + item + baseRex.source); // /[!,.?": ]<C/D/Angular...>[!,.?": ]/

// Say checking for first iteration only. So let consider 'C'.
 console.log(finalRex.test('C is required')); // true
 console.log(finalRex.test('Looking for a dev who knows C!')); // true
 console.log(finalRex.test('We need a C language dev.')); // true
 console.log(finalRex.test('Computer needed')); // Also returns true | Which is wrong!

});

I won't want the words contains the letter C also get a count.

Upvotes: 4

Views: 133

Answers (2)

crifan
crifan

Reputation: 14328

for C

input:

C is language is required.     
We need a C language dev.
Looking for a dev who knows C!
Computer needed
invalidC should not match
  • js regex: (?<!\w)C(?!\w)
  • match result:
    • Chrome:
    • Safari: not support look behind

extended to both C or Artificial Intelligence

input:

C is language is required.     
We need a C language dev.
Looking for a dev who knows C!
Computer needed
invalidC should not match
We need looking for someone who knows Artificial Intelligence.
not matchArtificial Intelligence
  • regex: (?<!\w)((C)|(Artificial Intelligence))(?!\w)
  • match result:
    • Chrome:

Note

for more about look ahead and look behind, can refer my summary:

and my (Chinese) tutorial: 环视断言 · 应用广泛的超强搜索:正则表达式

and even all regex: 一图让你看懂和记住所有正则表达式规则

Upvotes: 0

Sweeper
Sweeper

Reputation: 273265

The regex after the concatenation with the baseRex is:

[!,.?": ]?C[!,.?": ]?

Notice that [!,.?": ]? can match 0 or 1 characters. In Computer, both subpatterns of [!,.?": ]? matches 0 characters, and C matches C, causing the whole regex to match.

Presumably, you added ? there so that it works at the start and end of the string, where there are no characters to be matched. However, you should instead use ^ and $ for the start and end instead. Your whole regex should be:

(?:[!,.?": ]|^)C(?:[!,.?": ]|$)

You can also replace the character class with \W, which means [^0-9a-zA-Z_].

In fact, you don't actually need to do all of this! There is a useful 0-width matcher called "word-boundary" \b, which seems to be exactly the thing you want here. Your base regex can just be:

\b

It only matches the boundary between a \w and a \W or between a \W and a \w.

Upvotes: 2

Related Questions