Zalán Vajda
Zalán Vajda

Reputation: 3

Perl-regex word boundary equivalence

I read that the regex

\ba

is equivalent to

(?<!\w)a

but before that I had figured out that maybe

^a|\Wa

is equivalent too

My question is: What is the difference between those two? Could somebody write an example if they are not equivalent?

Upvotes: 0

Views: 237

Answers (2)

Nicholas Jones
Nicholas Jones

Reputation: 63

\ba will match a for the string !a while ^a|\Wa will match !a

This is the shortest example I can provide why they are NOT equivalent.

Upvotes: 0

ikegami
ikegami

Reputation: 385685

\b is equivalent to (?:(?<!\w)(?=\w)|(?<=\w)(?!\w)), so

\ba is equivalent to (?:(?<!\w)(?=\w)|(?<=\w)(?!\w))a, so

\ba is equivalent to (?<!\w)a because a matches \w.


Both \ba and (?<!\w)a are similar to both ^a|\Wa and (?:^|\W)a to the point of being occasionally interchangeable, but they are different because the former two match a single character and the latter two can match two. Compare:

say '!@a#$' =~ s/\ba//r;         # !@#$

say '!@a#$' =~ s/(?<!\w)a//r;    # !@#$

say '!@a#$' =~ s/^a|\Wa//r;      # !#$

say '!@a#$' =~ s/(?:^|\W)a//r;   # !#$

Upvotes: 2

Related Questions