Reputation: 5566
I have a situation where I need to test the string if it contains a particular word or letter using the Javascript Regex.
Sample strings would be:
// In the first 3 strings, I need "C" letter to be checked in the string
C is language is required.
We need a C language dev.
Looking for a dev who knows C!
// Keyword is Artificial Intelligence
We need looking for someone who knows Artificial Intelligence.
For checking the above I have created a Regex.
['C', 'Artificial Intelligence', 'D', 'Angular', 'JS'].forEach((item) => {
const baseRex = /[!,.?": ]?/g;
const finalRex = new RegExp(baseRex.source + item + baseRex.source); // /[!,.?": ]<C/D/Angular...>[!,.?": ]/
// Say checking for first iteration only. So let consider 'C'.
console.log(finalRex.test('C is required')); // true
console.log(finalRex.test('Looking for a dev who knows C!')); // true
console.log(finalRex.test('We need a C language dev.')); // true
console.log(finalRex.test('Computer needed')); // Also returns true | Which is wrong!
});
I won't want the words contains the letter C also get a count.
Upvotes: 4
Views: 133
Reputation: 14328
C
input:
C is language is required.
We need a C language dev.
Looking for a dev who knows C!
Computer needed
invalidC should not match
(?<!\w)C(?!\w)
look behind
C
or Artificial Intelligence
input:
C is language is required.
We need a C language dev.
Looking for a dev who knows C!
Computer needed
invalidC should not match
We need looking for someone who knows Artificial Intelligence.
not matchArtificial Intelligence
(?<!\w)((C)|(Artificial Intelligence))(?!\w)
for more about look ahead
and look behind
, can refer my summary:
and my (Chinese) tutorial: 环视断言 · 应用广泛的超强搜索:正则表达式
and even all regex: 一图让你看懂和记住所有正则表达式规则
Upvotes: 0
Reputation: 273265
The regex after the concatenation with the baseRex
is:
[!,.?": ]?C[!,.?": ]?
Notice that [!,.?": ]?
can match 0 or 1 characters. In Computer
, both subpatterns of [!,.?": ]?
matches 0 characters, and C
matches C
, causing the whole regex to match.
Presumably, you added ?
there so that it works at the start and end of the string, where there are no characters to be matched. However, you should instead use ^
and $
for the start and end instead. Your whole regex should be:
(?:[!,.?": ]|^)C(?:[!,.?": ]|$)
You can also replace the character class with \W
, which means [^0-9a-zA-Z_]
.
In fact, you don't actually need to do all of this! There is a useful 0-width matcher called "word-boundary" \b
, which seems to be exactly the thing you want here. Your base regex can just be:
\b
It only matches the boundary between a \w
and a \W
or between a \W
and a \w
.
Upvotes: 2