Reputation: 37159
G'day,
I am using the following Perl fragment to extract output from a Solaris cluster command.
open(CL,"$clrg status |");
my @clrg= grep /^[[:lower:][:space:]]+/,<CL>;
close(CL);
I get the following when I print the content of the elements of the array @clrg BTW "=>" and "<=" line delimiters are inserted by my print statement:
=><=
=>nas-rg mcs0.cwwtf.bbc.co.uk No Online<=
=> mcs1.cwwtf.bbc.co.uk No Offline<=
=><=
=>apache-rg mcs0.cwwtf.bbc.co.uk No Online<=
=> mcs1.cwwtf.bbc.co.uk No Offline<=
=><=
When I replace it with the following Perl fragment the blank lines are not matched.
open(CL,"$clrg status |");
my @clrg= grep /^[[:lower:][:space:]]{3,}/,<CL>;
close(CL);
And I get the following:
=>nas-rg mcs0.cwwtf.bbc.co.uk No Online<=
=> mcs1.cwwtf.bbc.co.uk No Offline<=
=>apache-rg mcs0.cwwtf.bbc.co.uk No Online<=
=> mcs1.cwwtf.bbc.co.uk No Offline<=
Simple question is why?
BTW Using {1,} in the second Perl fragment also matches blank lines!
Any suggestions gratefully received!
cheers,
Upvotes: 1
Views: 244
Reputation: 238296
That'll be because [:space:]
matches newlines and carriage returns as well.
So [[:space:]]+
would match \n
, \r\n
, or \n\n
.
But [[:space:]]{3,}
would require three characters, and an empty line is just a \n
.
{1,}
and +
mean the same thing: match the preceding group one or more times.
P.S. A typical newline is \n
on Unix and \r\n
on Windows.
Upvotes: 9
Reputation: 6753
Hm. According to the Perl regular expression documentation, the [:space:]
character class should not include newlines, as it is supposed be the equivalent of \s
(except that it recognizes an additional character, vertical-tab, to maintain POSIX compliance).
However, having just tested this on 5.10.0, I can verify that it is matching newlines as well. Whether this qualifies as a bug in Perl or in the documentation, I'll leave for the Perl maintainers. But to avoid the immediate problem, use the previous answerer's solution and just use \s
instead of the POSIX class.
Upvotes: 1