rcout
rcout

Reputation: 963

Strange behavior in a perl regexp with global substitution

Can somone explain me why the output of this small perl script is "foofoo" (and not "foo") ?

#!/usr/bin/perl -w 
my $var="a";
$var=~s/.*/foo/g;
print $var."\n";

Without the g option it works as I though it would but why is the global option matching pattern twice ?

In bash output is "foo" as expected

echo "a"|sed -e "s/.*/foo/g" 

Any explanation would be appreciated.

Upvotes: 17

Views: 556

Answers (4)

Jav_Rock
Jav_Rock

Reputation: 22245

It is more fun if you try

$var=~s/.*?/foo/g;

You will get

foofoofoo

The ? modifier matches 1 or 0 times. If you remove the g, you will get

fooa

because it will only replace the empty string, the first one it finds. I love perl.

Upvotes: 8

toolic
toolic

Reputation: 62037

If you add re to your code:

use re 'debug';

you will see that the regular expression successfully matches twice:

Compiling REx `.*'
size 3 Got 28 bytes for offset annotations.
first at 2
   1: STAR(3)
   2:   REG_ANY(0)
   3: END(0)
anchored(MBOL) implicit minlen 0
Offsets: [3]
        2[1] 1[1] 3[0]
Matching REx ".*" against "a"
  Setting an EVAL scope, savestack=5
   0 <> <a>               |  1:  STAR
                           REG_ANY can match 1 times out of 2147483647...
  Setting an EVAL scope, savestack=5
   1 <a> <>               |  3:    END
Match successful!
Matching REx ".*" against ""
  Setting an EVAL scope, savestack=7
   1 <a> <>               |  1:  STAR
                           REG_ANY can match 0 times out of 2147483647...
  Setting an EVAL scope, savestack=7
   1 <a> <>               |  3:    END
Match successful!
Matching REx ".*" against ""
  Setting an EVAL scope, savestack=7
   1 <a> <>               |  1:  STAR
                           REG_ANY can match 0 times out of 2147483647...
  Setting an EVAL scope, savestack=7
   1 <a> <>               |  3:    END
Match possible, but length=0 is smaller than requested=1, failing!
                            failed...
Match failed
foofoo
Freeing REx: `".*"'

Upvotes: 6

KJP
KJP

Reputation: 519

This is because you're using .* instead of .+

The * modifier tells the regex engine to match (and replace in your example) the string "a", then a zero-length string (and replace it, as well).

You can test this by using this regex in your sample code:

$var=~s/(.*)/<$1>/g;

You'll then see this output:

"<a><>"

Upvotes: 6

ysth
ysth

Reputation: 98398

First .* matches the a, then it matches the empty string after the a. Maybe you want .+?

Upvotes: 12

Related Questions