Reputation: 83
I'm scrapping this site and am looking for code examples to help me figure out how to retrieve the information inside of this JSP control (it's a lot harder than with regular text!). I don't seem to be finding anything with the HTTP headers. Here's my code so far:
my $mech_r = new WWW::Mechanize();
my $uri = 'http://global.krx.co.kr/contents/GLB/02/0203/0203000000/GLB0203000000.jsp';
print "Getting '$uri'\n";
my $page = $mech_r->get($uri);
print "Parsing data...";
my $root = HTML::TreeBuilder->new_from_content($mech_r->content());
my ($news_table) = $root->look_down(
sub {
defined($_[0]->tag()) and
$_[0]->tag() eq 'ul' and
defined($_[0]->attr('class')) and
$_[0]->attr('class') eq 'board-list'
}
);
if (!defined($news_table)) {
print Dumper($root);
croak "Could not get the news table";
}
I would like to get the title, date and the link.
But is not getting any data as it is being loaded by javascript.
Upvotes: 0
Views: 511
Reputation: 242038
As documented, WWW::Mechanize doesn't handle JavaScript. Try WWW::Mechanize::Firefox, WWW::Scripter, WWW::Selenium, WWW::Mechanize::PhantomJS, or similar.
Upvotes: 3