Reputation: 75
I need some help with this perl regular expression
s/.*?<\?lsmb if\s*?(\S*)\s*?\?>/$1/
in the code below parsing out some non-whitespace chars [A-Z][a-z][0-9][_]
surrounded by any number of whitespace and the other chars. I have tried various Perl regular expressions which are all commented out in the program below.
My main problem I think is stopping matching at the end.
The code below runs 8 tests, and I am hoping to find something that passes all 8.
#!/usr/bin/perl
use strict;
use warnings;
use diagnostics;
my $count = 0;
my $t = 0;
#examples of things I need to match, match => catagory
my $self = { 'customerfax' => 'alpha',
'_____' => 'Underscore',
'000000' => 'numeric',
'letter_reason_4' => 'alfa-numeric-underscore',
'customerphone7' => 'alfa-numeric',
'customer_phone' => 'alfa-underscore',
};
# must contain <?lsmb 'varname from $self' ?>
# may contain any amount of whitespace chars where one is depected
# will end with \n that is removed by chop below
my $test1 = qq|<?lsmb if customerfax ?> caacaacac\n|;
my $test2 = qq|<?lsmb if _____ ?> bbb\n|;
my $test3 = qq|<?lsmb if 000000 ?> cccc\n|;
my $test4 = qq|<?lsmb if letter_reason_4 ?><t \></'><><><>\n|; # /
my $test5 = qq| <?lsmb if customerfax ?> |;
my $test6 = qq|<?lsmb if customerphone7 ?> \<?lsmb ignore this >n|;
my $test7 = qq|<?lsmb if customer_phone ?>\n|;
my $test8 = qq| avcscc 34534534 <?lsmb if letter_reason_4 ?> 0xffff\n|;
strip_markup($test1);
strip_markup($test2);
strip_markup($test3);
strip_markup($test4);
strip_markup($test5);
strip_markup($test6);
strip_markup($test7);
strip_markup($test8);
if ($count == 8) { print "Passed All done\n";}
else { print "All done passed $count out of 8 Try again \n"; }
sub strip_markup {
$_= shift;
#print "strip_markup $_ \n";
if (/<\?lsmb if /) {
chop; # gets rid ot the new line
#original
#s/.*?<\?lsmb if (.+?) \?>/$1/;
#What I have tried:
#s/.*?<\?lsmb if(?:\s)*?(\S+?)(?:\s)*?\?>\b/$1/;
s/.*?<\?lsmb if\s*?(\S*)\s*?\?>/$1/;
#s/.*?<\?lsmb if\s*?([A-Za-z0-9_]*?)\s*?\?>/$1/;
#s/.*?<\?lsmb if[\s]*?(\S*?)[\s]*?\?>/$1/;
#s/.*?<\?lsmb if (\S*?) \?>/$1/;
#s/.*?<\?lsmb if (\S+?) \?>/$1/;
#s/.*?<\?lsmb if ([\S]+?)([\s]+?)\?>/$1/;
#s/.*?<\?lsmb if[\s]+([\S]+)[\s]+\?>/$1/;
#s/.*?<\?lsmb if\s*?([\S]*?)\s*?\?>/$1/;
#s/.*?<\?lsmb if\s+?([\S]+?)[\s]+?\?>/$1/;
#s/.*?<\?lsmb if ([\S]+?) \?>/$1/;
#s/.*?<\?lsmb if\s*?([\S_]*?)\s*?\?>/$1/;
#s/.*?<\?lsmb if\s*?([[a-zA-Z]|[\d]|[_]]*?)\s*?\?>/$1/;
#s/.*?<\?lsmb if\s*?([a-zA-Z\d_]*?)\s*?\?>/$1/;
#s/.*?<\?lsmb if\s*?([^[:space:]]+?)\s*?\?>/$1/;
$t++;
print "Test $t ";
#look up the result as the hash key
my $ok = $self->{$_};
if ($ok) {
$count++;
print "OK passed $ok,";
}
print qq|Test Value : '$_' \n|;
}
}
Here are some of the Tests and what they should return:
<?lsmb if customerfax ?> caacaacac\n
should return customerfax
<?lsmb if _____ ?> bbb\n
should return _____
avcscc 34534534 <?lsmb if letter_reason_4 ?> 0xffff\n
should return letter_reason_4
Upvotes: 3
Views: 743
Reputation: 66873
If my understanding of your requirements is right, the needed phrase is extracted by simple
my ($match) = $string =~ /<\?lsmb \s+ if \s+ (\w+)/x
In the list context the match operator m//
returns a list with matches. Even if it's just one, we need the list context – in the scalar context its behavior is different. The list context comes from assigning to a list from it, my (...) =
. The /x
modifier merely allows us to use spaces inside, for readability. See perlretut for starters.
What may precede <?
doesn't have to be specified, since the pattern matches anywhere in the string. The \w
is for [A-Za-z0-9_]
(see perlrecharclass), what seems to match your examples and description. The \S
is more permissive. Nothing is needed after \w+
.
Also, there is no need to first test whether the pattern is there
sub strip_markup
{
my ($test_res) = $_[0] =~ /<\?lsmb if (\w+)/;
if ($test_res) {
# ...
}
return $test_res; # return something!
}
There is no reason for the substitution so we use a match.
I understand that you are working with code you can't change, but would still like to comment
No need to remove the newline here. But when you do that, use chomp and not chop
The sub uses global variables. That can lead to bugs. Declare in small scope. Pass
The sub modifies global variables. That often leads to bugs while there is rarely need for it
Use arrays for repetitions of the same thing
This can be organized differently, to separate work more clearly
For example
my @tests = (
qq|<?lsmb if customerfax ?> caacaacac\n|,
# ...
);
my ($cnt, $t);
foreach my $test (@tests)
{
my $test_res = strip_markup($test);
if (defined $test_res) {
$t++;
print "Test $t ";
#look up the result as the hash key
my $ok = $self->{$test_res};
if ($ok) {
$count++;
print "OK passed $ok,";
}
print qq|Test Value : '$_' \n|;
}
else { } # report failure
}
sub strip_markup {
my ($test_res) = $_[0] =~ /<\?lsmb \s+ if \s+ (\w+)/x;
return $test_res;
}
The defined
test of $test_res
is to allow for falsey things (like 0
or ''
) to be valid results.
The reporting code can, and should be, in another subroutine.
Upvotes: 1