user5887573
user5887573

Reputation:

Catching Perl HTML::TreeBuilder::XPath module exceptions

I have created a test code for grabbing the content of an article on a website, however, the code fails on invalid html attribute. How do I go about catching invalid attribute exception? Or is there away around the problem?

Here's my code for grabbing article content

#!/usr/bin/perl -w             

use HTML::LinkExtor;           
use LWP::Simple;
use HTML::TreeBuilder::XPath;  
use Term::ProgressBar;


my $url = "http://www.totalpolitics.com/blog/159117/campaigning-to-keep-our-coastguards.thtml";
my $content = get $url;

my $tree = HTML::TreeBuilder::XPath->new_from_content($content); 

my $title = $tree->findvalue(q{//div[@id="article"]/h1});
my $body = defined($tree->findnodes_as_string(q{//div[@class="article-body"]})) ? shift : '';
my $author = $tree->findnodes(q{//div[@class="article-body"]/p/strong});
$author = $author->[0]->getValue;

my $xml .= '<?xml version="1.0" encoding="UTF-8" ?>';
$xml .= '<nodes>';
$xml .= '<node>';
$xml .= '<url>';
$xml .= $url;
$xml .= '</url>';
$xml .= '<title>';
$xml .= $title;
$xml .= '</title>';
$xml .= '<description>';
$xml .= "<![CDATA[$body]]>";
$xml .= '</description>';
$xml .= '<author>';
$xml .= $author;
$xml .= '</author>';
$xml .= "</node>\n";
$xml .= "</nodes>";

print $xml;

Error

span has an invalid attribute name ' _fck_bookmark' at /home/getmizanur/perl5/lib/perl5/XML/XPathEngine.pm line 125

Upvotes: 2

Views: 103

Answers (1)

realmaniek
realmaniek

Reputation: 513

  1. There's no try/catch mechanism in perl. However, critical errors could be handled with eval{} block, all you need is wrap the code that can fail inside.

Take a look at the following example:

sub do_something {
    print "something\n";
}

sub do_something_else {
    print "something_else\n";
}

eval {
  do_something();
  print 1/0; # ouch
  do_something_else();

};
# $@ - special variable keeping track on last error
if( $@ ) {
   warn "Error occured: $@";
}
  1. Consider using some template engine instead of building raw string this way. You can try HTML::Template for example.

Upvotes: 1

Related Questions