Reputation: 595
My script is required to insert pattern (?:<\/?[a-z\-\=\"\ ]+>)?
in words after each letter which can be used in another regular expression.
Problem is that is some words their may be regex pattern like .*?
or (?:<[a-z\-]+>)
. I tried it but error thows unmatched regex
where my pattern adds after (
or space created in regex causing this problem. Any help.
Here is the code I tried:
sub process_info{
my $process_mod = shift;
#print "$process_mod\n";
@b = split('',$process_mod);
my $flag;
for my $i(@b){
#print "@@@@@@@@ flag: $flag test: $i\n";
$i = "$i".'(?:<\/?[a-z\-\=\"\ ]>)?' if $flag == 0 and $i !~ /\\|\(|\)|\:|\?|\[|\]/;
#print "$i";
if ($i =~ /\\|\(|\)|\:|\?|\[|\]/){
$flag = 1;
}
else{
$flag = 0;
}
#print "After: $i\n";
}
$process_mod = join('',@b);
#print "$process_mod\n";
return $process_mod;
}
Upvotes: 1
Views: 57
Reputation: 91385
At the begining of the foreach loop, use this:
for my $i(@b){
$i = quotemeta $i;
$i .= '(?:<\/?[a-z\-\=\"\ ]>)?' if $flag == 0 and $i !~ /[\\|():?[\]]/;
# don't escape __^
Upvotes: 1
Reputation: 57600
You want to search for a certain plaintext in an XML file. You try to do this by inserting a regex for an XML tag between each character. This is wasteful, but it can be easily done by escaping all metacharacters in the input with the quotemeta
function:
sub make_XML_matchable {
my $string = @_;
my $xml_tag = qr{ ... }; # I won't write that regex for you
my $combined = join $xml_tag, map quotemeta, split //, $string;
return qr/$combined/; # return a compiled regex
}
This assumes that you'd want to write a regex that can match XML tags – not impossible, but tedious and difficult to do correctly. Use an XML parser instead to strip all tags from a section:
use XML::LibXML;
my $dom = XML::LibXML->load_xml(string => $xml)
my $text_content = $dom->textContent; # all tags are gone
Or if you're actually trying to match HTML, then you might want to use Mojolicious:
use Mojo;
my $dom = Mojo::DOM->new($html);
my $text_content = $dom->all_text; # all tags are replaced by a space
Upvotes: 2