Reputation: 4095
In Perl, I could do the below without word boundary
$_ =~ /\b[A-Z]\S*/;
as
$_ =~ /[A-Z]\S*/;
gives the same result for the input "A home For a Person".
Still in what circumstances word boundary '\b
' has significance?
Also, when I use the back-reference of type $1
, $2
... etc., then the special character '@
' needs to be escaped as '\@
'
echo gmail.in@x1 | perl -pe 's/(\S+)@(.*)/$2\@$1/' # Ans: [email protected]
But when I use the backreference of this version, \1
or \2
.. etc., then I do not need to escape the '@
' character in the replacement part:
echo gmail.in@x1 | perl -pe 's/(\S+)@(.*)/\2@\1/' # Ans: [email protected]
Why this behavior?
Upvotes: 1
Views: 82
Reputation: 39365
Lets start from the second one. In Perl @
is a special character for Array. Please take a look at this example:
my @a = qw(a c v); # array
my $ref = \@a; # i am taking reference of the array
print @$ref; # now using @ sign i am taking out the array from reference
Another example:
my $str = "abc"; # a random string
$str =~ /(.)/; # matching a character into $1
print "ok @$1"; # output: ok
print "ok \@$1"; # output: ok @a
For the above example the first output is only ok
. Because it considered $1
as array reference as I have used @
just before this. So @$1
has an empty array. For the second output it came up with ok @a
because I have escaped the \@
and $1
has a
in it from the previous regex match.
Now come to the first question. I am changing the input string a bit here.
my $str = 'aaA home For a Person';
## case-1
if ($str =~ /(\b[A-Z]\S*)/) {
print "$1"; ## output: For
}
## case-2
if ($str =~ /([A-Z]\S*)/) {
print "$1"; ## output: A
}
You have seen the output are different here. \b
defines for any non word(not in \w
) character. So in the first example A
has aa
before it. And the regex didn't pick the A
this time and went for the next capital letter F
as it has space before it(a non word character).
Upvotes: 0
Reputation: 67900
In your first question, the "result" you speak of is merely that they can both fail or succeed. You are not actually capturing a string, so your question is somewhat moot. However, the word boundary will prevent partial matching, for example:
'foobar' =~ /\b(bar)/; # will not match
'foobar' =~ /(bar)/; # will match
The word boundary is a zero-width assertion that matches the space between a word character and a non-word character, i.e. a boundary around a word.
Your second question is simply that @$1
is the dereferencing of a reference, and @\1
is not. If you store an array reference in a scalar variable, and want to dereference it to access the original array, you place an @
sign in front of it, like so:
my @array = (1, 2, 3);
my $aref = \@array;
my @new = @$aref; # @new now contains 1,2,3
That being said, using \1
is not recommended. If you turn warnings on, your statement will give the following warning:
\1 better written as $1 at -e line 1.
Upvotes: 1