Reputation: 347
#!/usr/bin/perl
@lines = `perldoc -u -f atan2`;
foreach (@lines) {
s/\w<([^>]+)>/\U$1/g;
print;
}
How will the expression s/\w<([^>]+)>/\U$1/g;
work?
Upvotes: 3
Views: 192
Reputation: 958
Here is an another option to figure out what it is doing. Use the module YAPE::Regex::Explain from CPAN.
Using it in this fashion (This is just the match part of the search and replace):
use strict;
use YAPE::Regex::Explain;
print YAPE::Regex::Explain->new(qr/\w<([^>]+)>/)->explain();
Will give this output:
The regular expression:
(?-imsx:\w<([^>]+)>)
matches as follows:
NODE EXPLANATION
----------------------------------------------------------------------
(?-imsx: group, but do not capture (case-sensitive)
(with ^ and $ matching normally) (with . not
matching \n) (matching whitespace and #
normally):
----------------------------------------------------------------------
\w word characters (a-z, A-Z, 0-9, _)
----------------------------------------------------------------------
< '<'
----------------------------------------------------------------------
( group and capture to \1:
----------------------------------------------------------------------
[^>]+ any character except: '>' (1 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
) end of \1
----------------------------------------------------------------------
> '>'
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
The substitute part of the expression is stating that the match which was made earlier between "group and capture to \1" and "end of \1" should be converted to uppercase.
Upvotes: 4
Reputation: 264381
The perl loop looks like this:
foreach $item (@array)
{
# Code in here. ($item takes a new value from array each iteration)
}
But perl allows you to leave out variables nearly everywhere.
When you do this the special variable $_
is used.
So in your case:
foreach (@lines)
{
}
Is exactly the same as:
foreach $_ (@lines)
{
}
Now inside the body the following code:
s/\w<([^>]+)>/\U$1/g;
Has the same thing happening. You are actually working on a variable. And when you do not specify a variable perl defaults to $_
.
Thus it is the equivalent of:
$_ =~ s/\w<([^>]+)>/\U$1/g;
Combine the two:
foreach (@lines) {
s/\w<([^>]+)>/\U$1/g;
print;
}
Is equivalent too:
foreach $item (@lines)
{
$item =~ s/\w<([^>]+)>/\U$1/g;
print $item;
}
I use $item
just for readability. Internally it means $_
.
Lots of perl code uses this type of shortcut. Personally I think it makes it harder to read (even for experienced perl programmers (its one of the reason perl got a reputation for unreadability)). As a result I always try and be explicit about the use of variables (but this (my usage) is not typical perl usage).
Upvotes: 0
Reputation: 67900
The substitution does this:
s/
\w< # look for a single alphanumeric character followed by <
([^>]+) # capture one or more characters that are not <
> # followed by a >
/ ### replace with
\U # change following text to uppercase
$1 # the captured string from above
/gx # /g means do this as many times as possible per line
I added the /x
modifier to be able to visualize the regex. The character class [^>]
is negated, as denoted by the ^
character after the [
, which means "any character except >
".
For example, in the output from the perldoc command
X<atan2> X<arctangent> X<tan> X<tangent>
Is changed to
ATAN2 ARCTANGENT TAN TANGENT
Upvotes: 4