paparazzo
paparazzo

Reputation: 45106

Regex using word boundary but word ends with a . (period)

want to match word i.v. case insensitive

have pattern

(?i)\bi\.v\.

but want a word boundary on the end
the above pattern fails in that it matches
i.v.x

but if I try and add a work boundary to the end

(?i)\bi\.v\.\b

it fails in that it does not even match i.v. as I think the \b is eating the literal . as . is a word break
need the \. to be greedy

i want to match
sam i.v. sam

do not want to match
sam.i.v.
i.v.sam

This get closer

(?i)\bi\.v\.\s$

But it fails to find i.v. at the end of a line

Upvotes: 19

Views: 8788

Answers (4)

Benedict Harris
Benedict Harris

Reputation: 304

you can also have the boundry in place of the last dot.

(?i)\bi\.v\b

only drawback is that it will also match i.v

Upvotes: 0

anubhava
anubhava

Reputation: 786291

About your current regex:

You don't need to have \b after dot since dot is not considered a word character but of course dot needs to be escaped:

(?i)\bi\.v\.

But you do need \b before i to make sure it doesn't match e.g. hi

EDIT: (Based on your further edits)

Try this regex:

(?i)\bi\.v\.(?=\s|$)

Upvotes: 2

Tim Pietzcker
Tim Pietzcker

Reputation: 336478

\b only matches between an alphanumeric character and a non-alphanumeric character (or the start/end of string). Therefore, it doesn't match after a ., unless an alphanumeric character immediately follows that dot.

If your intent is to make sure that no non-whitespace character follows after the dot, then you can specify that using a negative lookahead assertion:

(?i)\bi\.v\.(?!\S)

(?!\S) means "Assert that the next character is not a non-whitespace character".

This may sound a bit convoluted - why the double negative? Why not (?=\s) which means "Assert that the next character is a whitespace character"? Well, there is a subtle difference: The second version requires a whitespace character to be there; that means the regex would fail to match at the end of the string. The first regex handles that corner case as well.

If you generally want the concept of "word boundary" to mean "space-delimited", then you need to replace the first \b as well:

(?i)(?<!\S)i\.v\.(?!\S)

or the regex will match sam.i.v. which you don't seem to want it to.

Upvotes: 29

Casimir et Hippolyte
Casimir et Hippolyte

Reputation: 89639

You seems to be very confuse with word boundaries and greedy notions. The best thing you can do is to go to these addresses:

  • what is a greedy quantifier:

http://www.regular-expressions.info/repeat.html

  • what is a word boundary:

http://www.regular-expressions.info/wordboundaries.html

When you will read these explanations, I am sure you will think that your problem was ridiculous.

Upvotes: -3

Related Questions