Madoc Comadrin
Madoc Comadrin

Reputation: 568

Selecting menu item using PhantomJS

I have simple PhantomJS script to parse Javascript content of website to html. (Some data is then extracted from the html code using other tool.)

var page = require('webpage').create();
var fs = require('fs');// File System Module
var output = '/tmp/sourcefile'; // path for saving the local file
page.open('targeturl', function() { // open the file
  fs.write(output,page.content,'w'); // Write the page to the local file using page.content
  phantom.exit(); // exit PhantomJs
});

(I got these lines of code from http://kochi-coders.com/2014/05/06/scraping-a-javascript-enabled-web-page-using-beautiful-soup-and-phantomjs/)

This used to work when all targets had direct links. Now they are behind the same url and there is drop down menu:

<select id="observation-station-menu" name="station" onchange="updateObservationProductsBasedOnForm(this);">
  <option value="101533">Alajärvi Möksy</option>
  ...    
  <option value="101541">Äänekoski Kalaniemi</option>
  </select>

This is the menu item I would actually like to load:

<option value="101632">Joensuu Linnunlahti</option>

Because of this menu my script only downloads data related to the default location. How I load contents of other item from the menu and download html content of that item instead?

My target site is this: http://ilmatieteenlaitos.fi/suomen-havainnot

(If there is better way than PhantomJS for doing this I could use it just as well. My interest is in dealing with the data once get it scraped and I chose PhantomJS just because it was the first thing that worked. Some options might be limited because my server is a Raspberry Pi and might not work on it: Python Selenium: Firefox profile error)

Upvotes: 2

Views: 1350

Answers (2)

user5542121
user5542121

Reputation: 1052

You could directly call the function, which is defined in the underlying js on that page:

var page = require('webpage').create();
var fs = require('fs');// File System Module
var output = '/tmp/sourcefile'; // path for saving the local file
page.open('targeturl', function() { // open the file
  page.evaluate(function() {
     updateObservationProducts(101632, 'weather');
  });
  window.setTimeout(function () {
    fs.write(output,page.content,'w'); // Write the page to the local file using page.content
    phantom.exit(); // exit PhantomJs
  }, 1000); // Change timeout as required to allow sufficient time 

});

For the waiting to render, see this phantomjs not waiting for "full" page load , I copy pasted a part from rhunwicks solution.

Upvotes: 1

Gustavo F
Gustavo F

Reputation: 2206

Since the page have jQuery, you can do:

page.open('targeturl', function() { // open the file
  page.evaluate(function() {
    jQuery('#observation-station-menu').val('101632').change();
  });  //change the checkbox, then fires the event
  fs.write(output,page.content,'w'); // Write the page to the local file using page.content
  phantom.exit(); // exit PhantomJs
});

Upvotes: 3

Related Questions