Mahender
Mahender

Reputation: 5664

Regex how to check word boundary conditions unicode

I am trying to check a given word (say matchword) is in the sentence from external source. In C# currently i am planning to use below regex pattern to cover these word boundary scenarios (matchword should be a single word can delimit with all possible sentence or word breaking characters ). matchword can be in the beginning/middle/end of the sentence or sometimes it can be exact match of the string.

Should cover multilingual text, and case insensitive.

([\s+,"'\(\[])matchword([\s+;\?\.;,"'\)\]])

An example,

assume my matchword is "test" (without quotes)

and sample sentences are:

this is test, string -- Result - true

this is testing -- Result - false

this is testest -- result - false

Test -- Result - true

Upvotes: 0

Views: 842

Answers (2)

Diego D
Diego D

Reputation: 8160

I guess negative look-around may be enough in your case:

(?<!\w)test(?!\w)

That means: the word test not preceded or followed by a \w character.

If you want to make the expression case insensitive in C# you have to use the flag RegexOptions.IgnoreCase like in the following example:

Regex.IsMatch(subjectString, @"(?<!\w)test(?!\w)", RegexOptions.IgnoreCase)

Here explains better what look-around is all about. Anyway the above answer involving \b is much better in your case. Just take this concept as something you may want to get deeper into to better master regular expressions. Its power lays in the ability to choose more complex expressions to include in the look-ahead or look-behind groups. In your case it's just a waste.

Upvotes: 1

Jason
Jason

Reputation: 3960

Try \btest\b where \b denotes begining and end of a word or you can do (?i)\btest\b to make it case insensitive

Upvotes: 1

Related Questions