conquester
conquester

Reputation: 1132

Creating multiple instances of PhantomJS or only one?

I have a script which needs to crawl a website. For every request(each URL), I initialize a new web driver with selenium/phantomJS . Is this approach unscalable and will it cost a lot of CPU usage over time? Should I rather only create a single driver and save it somewhere in a global variable and reuse it for all the requests? Will doing this lead to lower cpu usage or won't be much effective?

Upvotes: 1

Views: 1718

Answers (2)

th3an0maly
th3an0maly

Reputation: 3510

PhantomJS has an embedded webserver (Mongoose) that you can run and receive requests to. This avoids the need to initialize it every time. Warming up is quite costly in PhantomJS.

Here is a sample webserver code in PhantomJS that you could start with:

var port = 9494;
var server = require('webserver').create();
var page = require('webpage').create();

var your_method = function(data) {
    # Do stuff here
};

service = server.listen(port, function (request, response) {
  var input = JSON.parse(request.post);
  page.open(url, function (status) {
    page.evaluate(your_method, input)
});

if (service) {
  console.log('Server running on port ' + port);
} else {
  console.log('Error: Could not create web server listening on port ' + port);
  phantom.exit();
}

From the documentation;

This is intended for ease of communication between PhantomJS scripts and the outside world and is not recommended for use as a general production server. There is currently a limit of 10 concurrent requests; any other requests will be queued up.

Upvotes: 1

alecxe
alecxe

Reputation: 473903

For every request(each URL), I initialize a new web driver with selenium/phantomJS . Is this approach unscalable and will it cost a lot of CPU usage over time?

This is definitely a problem. PhantomJS instances are usually heavy on CPU and it is not a reliable way to scale. If you can reuse the same "webdriver" instance without problems or a negative impact on the performance, do it. If not, look into making a Selenium grid with multiple selenium nodes - workers that would actually have browser instances running. You can also look into using remote selenium servers, like BrowserStack or Sauce Labs.

Upvotes: 3

Related Questions