waghso
waghso

Reputation: 595

Escaping regex pattern in string while using regex on the same

My script is required to insert pattern (?:<\/?[a-z\-\=\"\ ]+>)?in words after each letter which can be used in another regular expression. Problem is that is some words their may be regex pattern like .*? or (?:<[a-z\-]+>). I tried it but error thows unmatched regex where my pattern adds after ( or space created in regex causing this problem. Any help.

Here is the code I tried:

sub process_info{
    my $process_mod = shift;
    #print "$process_mod\n";
    @b = split('',$process_mod);

    my $flag;
    for my $i(@b){


        #print "@@@@@@@@ flag: $flag test: $i\n";
        $i = "$i".'(?:<\/?[a-z\-\=\"\ ]>)?' if $flag == 0 and $i !~ /\\|\(|\)|\:|\?|\[|\]/;
        #print "$i";

        if ($i =~ /\\|\(|\)|\:|\?|\[|\]/){
            $flag = 1;
        }
        else{
            $flag = 0;
        }


        #print "After: $i\n";
    }

    $process_mod = join('',@b);

    #print "$process_mod\n";
    return $process_mod;
}

Upvotes: 1

Views: 57

Answers (2)

Toto
Toto

Reputation: 91385

At the begining of the foreach loop, use this:

for my $i(@b){
    $i = quotemeta $i;
    $i .= '(?:<\/?[a-z\-\=\"\ ]>)?' if $flag == 0 and $i !~ /[\\|():?[\]]/;
    #            don't escape __^

Upvotes: 1

amon
amon

Reputation: 57600

You want to search for a certain plaintext in an XML file. You try to do this by inserting a regex for an XML tag between each character. This is wasteful, but it can be easily done by escaping all metacharacters in the input with the quotemeta function:

sub make_XML_matchable {
  my $string = @_;
  my $xml_tag = qr{ ... };  # I won't write that regex for you
  my $combined = join $xml_tag, map quotemeta, split //, $string;
  return qr/$combined/;  # return a compiled regex
}

This assumes that you'd want to write a regex that can match XML tags – not impossible, but tedious and difficult to do correctly. Use an XML parser instead to strip all tags from a section:

use XML::LibXML;

my $dom = XML::LibXML->load_xml(string => $xml)
my $text_content = $dom->textContent; # all tags are gone

Or if you're actually trying to match HTML, then you might want to use Mojolicious:

use Mojo;

my $dom = Mojo::DOM->new($html);
my $text_content = $dom->all_text;  # all tags are replaced by a space

Upvotes: 2

Related Questions