Using Perl to strip everything from a string except HTML Anchor Links

Question

Using Perl, how can I use a regex to take a string that has random HTML in it with one HTML link with anchor, like this:

  Whatever Example

and it leave ONLY that and get rid of everything else? No matter what was inside the href attribute with the title=, or style=, or whatever. and it leave the anchor: "Whatever Example" and the ?

Sinan &#220;n&#252;r · Accepted Answer

You can take advantage of a stream parser such as HTML::TokeParser::Simple:

#!/usr/bin/env perl

use strict;
use warnings;

use HTML::TokeParser::Simple;

my $html = <Whatever Interesting Example

       and it leave ONLY that and get rid of everything else? No matter what
   was inside the href attribute with the ?
EO_HTML

my $parser = HTML::TokeParser::Simple->new(string => $html);

while (my $tag = $parser->get_tag('a')) {
    print $tag->as_is, $parser->get_text('/a'), "
";
}

Output:

$ ./whatever.pl
Whatever Interesting Example

Using Perl to strip everything from a string except HTML Anchor Links

Answers (2)

Related Questions