BashingPerl
BashingPerl

Reputation: 83

Scraping JS contents of a site using perl

I'm scrapping this site and am looking for code examples to help me figure out how to retrieve the information inside of this JSP control (it's a lot harder than with regular text!). I don't seem to be finding anything with the HTTP headers. Here's my code so far:

 my $mech_r = new WWW::Mechanize();

 my $uri = 'http://global.krx.co.kr/contents/GLB/02/0203/0203000000/GLB0203000000.jsp';
 print "Getting '$uri'\n";
 my $page = $mech_r->get($uri);

 print "Parsing data...";
 my $root = HTML::TreeBuilder->new_from_content($mech_r->content());

 my ($news_table) = $root->look_down(
     sub {
         defined($_[0]->tag()) and
         $_[0]->tag() eq 'ul' and
         defined($_[0]->attr('class')) and
         $_[0]->attr('class') eq 'board-list'
     }
 );

 if (!defined($news_table)) {
     print Dumper($root);

     croak "Could not get the news table";
 }

I would like to get the title, date and the link.

But is not getting any data as it is being loaded by javascript.

Upvotes: 0

Views: 511

Answers (1)

choroba
choroba

Reputation: 242038

As documented, WWW::Mechanize doesn't handle JavaScript. Try WWW::Mechanize::Firefox, WWW::Scripter, WWW::Selenium, WWW::Mechanize::PhantomJS, or similar.

Upvotes: 3

Related Questions