Reputation: 41
I am seeing some strange behavior with grouping in Perl.
Below is a file snippet I have:
nmos MNANT2(sam_1_,sam_1_,sam_1_);
nmos MNANT1(sam[0],sam[0],sam[0]);
nmos MNANT3(ovstb,ovstb,ovstb);
nmos M3(net14, VSS, in);
Basically I am trying to match those lines, where all the 3 fields inside braces are same.
Was trying it out with below one liners:
perl -nle 'm/(.+?\((.+?),$2,$2\).+)/ && print $1' new
It doesn't work, but the below guy works fine:
perl -nle 'm/(.+?\((.+?),\2,\2\).+)/ && print $1' new
So, my doubt is why $2 didn't work and \2 works well here? Shouldn't we be using "$" for back references, as I have used $1 towards the end?
And, Okay, if "\" works fine everywhere, I just tried putting \1 also, instead of $1 like below:
perl -nle 'm/(.+?\((.+?),\2,\2\).+)/ && print \1' new
It returns below error:
SCALAR(0x1a49678)
SCALAR(0x1a49678)
SCALAR(0x1a49678)
What am I missing fundamentally here? Looking forward from the experts.
Upvotes: 1
Views: 136
Reputation: 385655
You seem to think the regex patterns and Perl code are the same language. a+b
in a regex pattern isn't addition, and \2
outside a regex isn't an instruction to match the second capture.
perl -nle 'm/(.+?\((.+?),$2,$2\).+)/ && print $1' new
doesn't work because $2
is interpolated into the pattern before the pattern is even compiled.
perl -nle 'm/(.+?\((.+?),\2,\2\).+)/ && print $1' new
works because the regex atom \2
means "match what the second capture captured."
perl -nle 'm/(.+?\((.+?),\2,\2\).+)/ && print \1' new
doesn't work because \
is Perl's reference-taking operator.
Upvotes: 2
Reputation: 38919
The m//
and print
commands are separate commands joined by an &&
.
Within a regex \2
is a backreference to the second capture, which will be assigned to the $2
variable after the regex has finished matching. Outside the regex \2
is meaningless; only $2
is a variable that can be accessed. See here for more info: http://perldoc.perl.org/perlretut.html#Backreferences
When reading that link, note that after Perl 5.10 \2
is still recognized but \g2
is encouraged. This is because \11
is ambiguous.
Upvotes: 1