oddi
oddi

Reputation: 387

Whats wrong with this regular expression?

I have the following regular expression to find word in text and highlight them

Using the word surface for testing purposes.

/((?<=[\W])surface?(?![\w]))|((?<![\w])surface?(?=[\W]))/iu

It matches all occurences in the following text.

surface-CoP-20-70-0000-04-02_Pre-Run_Tool_Verification_Programming_and_surface_Tare surface_revC.pdf

But if i change the first occurence of surface to contain a upper case letter, it only matches the first occurence.

Surface-CoP-20-70-0000-04-02_Pre-Run_Tool_Verification_Programming_and_surface_Tare surface_revC.pdf

Or if i put an upper case letter in some of the other occurences it matches that.

Surface-CoP-20-70-0000-04-02_Pre-Run_Tool_Verification_Programming_and_Surface_Tare surface_revC.pdf

Upvotes: 0

Views: 119

Answers (3)

Alan Moore
Alan Moore

Reputation: 75272

So you want to match surface case-insensitively unless it's preceded or followed immediately by a letter or digit? Try this:

/(?<![A-Za-z0-9])surface(?![A-Za-z0-9])/i

I left off the /u modifier (which causes the regex and the subject string to be treated as UTF-8) because you appear to be dealing with pure ASCII text. \w, \W and \b are not affected by /u anyway.

Upvotes: 0

Peter Boughton
Peter Boughton

Reputation: 112240

I have no idea what you're trying to achieve there, but possibly your problem is that \w will include _ (and \W will exclude it).

Maybe try this:

/(?<![a-z])surface(?![a-z])/iu

Or this:

/(?<=[\W_])surface(?=[\W_])/iu

Otherwise, please provide more details on what exactly you do/don't want to match.


Update: given this information:

surface2010 should not be matched

In that case, I suspect you want:

/(?<=\b|_)surface(?=\b|_)/iu

(since just \b would exclude a match containing "...and_surface_Tare..." so we add the alternation with _ to include that.)

Upvotes: 1

strager
strager

Reputation: 90062

Am I missing something?

/\bsurface\b/i

Upvotes: 0

Related Questions