Reputation: 15
I want to extract the HTML code of a TWiki (who's URL i have). What is the best possible way of doing that?
Additionally, once i extract the HTML code i need to out it in a site hosted on Google Sites. Is that possible?
Upvotes: 2
Views: 59
Reputation: 13792
A very simple way to get a HTML page is the LWP::Simple module. If you have to do a more complex navigation flow, then use WWW::Mechanize. Then, if you need to parse the HTML code, the @brian solution is good.
Upvotes: 2
Reputation: 272267
Sounds like you need the CPAN HTML::Parser module.
use HTML::Parser ();
# Create parser object
$p = HTML::Parser->new( api_version => 3,
start_h => [\&start, "tagname, attr"],
end_h => [\&end, "tagname"],
marked_sections => 1,
);
# Parse directly from file
$p->parse_file("foo.html");
Upvotes: 1