dreeves
dreeves

Reputation: 26952

Why does it seem like the * in Perl regex isn't being greedy?

I expected this to print "[b]" but it prints "[]":

$x = "abc";
$x =~ /(b*)/;
print "[$1]";

If the star is replaced with a plus, it acts as I expect. Aren't both plus and star supposed to be greedy?

ADDED: Thanks everyone for pointing out (within seconds, it seemed!) that "b*" matches the empty string, the first occurrence of which is before the string even starts. So greediness is not the issue at all. It matches the empty string before even getting to the first 'b'.

Upvotes: 3

Views: 327

Answers (6)

kixx
kixx

Reputation: 3295

Matching as early as possible has a higher priority than the length of the match (AFAIR this is the case of Perl's regex matching engine, which is a NFA). Therefore a zero length match at the start of the string is more desirable than a longer match later in the string.

For more information search for "DFA vs NFA" in this article about regex matching engines.

Upvotes: 1

brian d foy
brian d foy

Reputation: 132858

A * at the end of a pattern is almost always not what you want. We even have this as a trick question in Learning Perl to illustrate just this problem.

Upvotes: 0

Blixt
Blixt

Reputation: 50179

The regex will match a(backtrack) (which is an empty value since the regex backtracked) and end there. With the + quantifier it doesn't match a or c so the value of $1 becomes b.

Upvotes: 3

Logan Capaldo
Logan Capaldo

Reputation: 40346

It is greedy, but b* will match the empty string. anything* will always match the empty string so,

  "abc"
  /\
     --- matches the empty string here.

If you print $' you'll see it's abc, which is the rest of the string after the match. Greediness just means that in the case of "bbb", you get "bbb", and not "b" or "bb".

Upvotes: 10

Adrian Pronk
Adrian Pronk

Reputation: 13906

The regex matches at the earliest point in the string that it can. In the case of 'abc' =~ /(b*)/, that point is right at the beginning of the string where it can match zero b's. If you had tried to match 'bbc', then you would have printed:

[bb]

Upvotes: 3

chaos
chaos

Reputation: 124317

The pattern will match and return the first time b* is true, i.e. it will perform a zero-width match at a. To more clearly illustrate what's going on, do this:

$x = "zabc";
$x =~ /(.b*)/;
print "[$1]";

Upvotes: 10

Related Questions