Reputation: 387
I have the following regular expression to find word in text and highlight them
Using the word surface for testing purposes.
/((?<=[\W])surface?(?![\w]))|((?<![\w])surface?(?=[\W]))/iu
It matches all occurences in the following text.
surface-CoP-20-70-0000-04-02_Pre-Run_Tool_Verification_Programming_and_surface_Tare surface_revC.pdf
But if i change the first occurence of surface to contain a upper case letter, it only matches the first occurence.
Surface-CoP-20-70-0000-04-02_Pre-Run_Tool_Verification_Programming_and_surface_Tare surface_revC.pdf
Or if i put an upper case letter in some of the other occurences it matches that.
Surface-CoP-20-70-0000-04-02_Pre-Run_Tool_Verification_Programming_and_Surface_Tare surface_revC.pdf
Upvotes: 0
Views: 119
Reputation: 75272
So you want to match surface
case-insensitively unless it's preceded or followed immediately by a letter or digit? Try this:
/(?<![A-Za-z0-9])surface(?![A-Za-z0-9])/i
I left off the /u
modifier (which causes the regex and the subject string to be treated as UTF-8) because you appear to be dealing with pure ASCII text. \w
, \W
and \b
are not affected by /u
anyway.
Upvotes: 0
Reputation: 112240
I have no idea what you're trying to achieve there, but possibly your problem is that \w
will include _
(and \W
will exclude it).
Maybe try this:
/(?<![a-z])surface(?![a-z])/iu
Or this:
/(?<=[\W_])surface(?=[\W_])/iu
Otherwise, please provide more details on what exactly you do/don't want to match.
Update: given this information:
surface2010 should not be matched
In that case, I suspect you want:
/(?<=\b|_)surface(?=\b|_)/iu
(since just \b
would exclude a match containing "...and_surface_Tare..." so we add the alternation with _
to include that.)
Upvotes: 1