kobame
kobame

Reputation: 5856

Perl-regex word-boundary equivalence

Extending my basic regex knowledge, and some things are unclear for me.

If the \b matches word boundary has the next two regexes the same meaning - e.g. will match the same strings?

/\bword\b/
/(^|\W)word(\W|$)/m    #when multi-line is turned on
/(\A|\W)word(\W|\z)/

asking because the \b means word boundary. The word is \w+, so the \b must be anything what isn't \w, e.g. it must be \W or the begining or the end of string, or line. (or no?) (not counting capture groups, probably would be better to use some non-capturing look-somewhere).

and those two?

/word\B/
/word\w/

If the word must be "nonword-boundary" at the end, that means thet the word must be followed by \w (word) character. (or no?)

Upvotes: 1

Views: 3070

Answers (2)

ikegami
ikegami

Reputation: 385655

(Ignore whitespace in the following patterns. I assumed /x is being used for readability.)


\b

is equivalent to

(?<!\w)(?=\w) | (?<=\w)(?!\w)

so

\b word \b

is equivalent to

(?: (?<!\w)(?=\w) | (?<=\w)(?!\w) ) word (?: (?<!\w)(?=\w) | (?<=\w)(?!\w) )

which simplifies to

(?<!\w) word (?!\w)

What you suggested as equivalents are slightly different.


\B

is equivalent to

(?<=\w)(?=\w) | (?<!\w)(?!\w)

so

word \B

is equivalent to

word (?: (?<=\w)(?=\w) | (?<!\w)(?!\w) )

which simplifies to

word (?=\w)

What you suggested as equivalent (word\w) is slightly different.

Upvotes: 6

walid toumi
walid toumi

Reputation: 2272

\bword is the same as (?<!\w)word and word\b is the same as word(?!\w)

\Bword is equal to (?<=\w)word and word\B equal word(?=\w)

Upvotes: 3

Related Questions