Ibrahim Quraish
Ibrahim Quraish

Reputation: 4095

Perl doubts - regarding usage of word boundary and need for escaping the special character in back reference syntax:

In Perl, I could do the below without word boundary

$_ =~ /\b[A-Z]\S*/;

as

$_ =~ /[A-Z]\S*/;

gives the same result for the input "A home For a Person".

Still in what circumstances word boundary '\b' has significance?


Also, when I use the back-reference of type $1, $2 ... etc., then the special character '@' needs to be escaped as '\@'

echo gmail.in@x1 | perl -pe 's/(\S+)@(.*)/$2\@$1/'    # Ans: [email protected]

But when I use the backreference of this version, \1 or \2 .. etc., then I do not need to escape the '@' character in the replacement part:

echo gmail.in@x1 | perl -pe 's/(\S+)@(.*)/\2@\1/'   # Ans: [email protected]

Why this behavior?

Upvotes: 1

Views: 82

Answers (2)

Sabuj Hassan
Sabuj Hassan

Reputation: 39365

Lets start from the second one. In Perl @ is a special character for Array. Please take a look at this example:

my @a = qw(a c v);  # array
my $ref = \@a;      # i am taking reference of the array
print @$ref;        # now using @ sign i am taking out the array from reference

Another example:

my $str = "abc";   # a random string
$str =~ /(.)/;     # matching a character into $1
print "ok @$1";    # output: ok
print "ok \@$1";   # output: ok @a

For the above example the first output is only ok. Because it considered $1 as array reference as I have used @ just before this. So @$1 has an empty array. For the second output it came up with ok @a because I have escaped the \@ and $1 has a in it from the previous regex match.

Now come to the first question. I am changing the input string a bit here.

my $str = 'aaA home For a Person';
## case-1
if ($str =~ /(\b[A-Z]\S*)/) {
    print "$1";    ## output: For
}
## case-2    
if ($str =~ /([A-Z]\S*)/) {
    print "$1";    ## output: A
}

You have seen the output are different here. \b defines for any non word(not in \w) character. So in the first example A has aa before it. And the regex didn't pick the A this time and went for the next capital letter F as it has space before it(a non word character).

Upvotes: 0

TLP
TLP

Reputation: 67900

In your first question, the "result" you speak of is merely that they can both fail or succeed. You are not actually capturing a string, so your question is somewhat moot. However, the word boundary will prevent partial matching, for example:

'foobar' =~ /\b(bar)/;    # will not match
'foobar' =~ /(bar)/;      # will match

The word boundary is a zero-width assertion that matches the space between a word character and a non-word character, i.e. a boundary around a word.

Your second question is simply that @$1 is the dereferencing of a reference, and @\1 is not. If you store an array reference in a scalar variable, and want to dereference it to access the original array, you place an @ sign in front of it, like so:

my @array = (1, 2, 3);
my $aref  = \@array;
my @new   = @$aref;      # @new now contains 1,2,3

That being said, using \1 is not recommended. If you turn warnings on, your statement will give the following warning:

\1 better written as $1 at -e line 1.

Upvotes: 1

Related Questions