Reputation: 1132
I have a script which needs to crawl a website. For every request(each URL), I initialize a new web driver with selenium/phantomJS . Is this approach unscalable and will it cost a lot of CPU usage over time? Should I rather only create a single driver and save it somewhere in a global variable and reuse it for all the requests? Will doing this lead to lower cpu usage or won't be much effective?
Upvotes: 1
Views: 1718
Reputation: 3510
PhantomJS has an embedded webserver (Mongoose) that you can run and receive requests to. This avoids the need to initialize it every time. Warming up is quite costly in PhantomJS.
Here is a sample webserver code in PhantomJS that you could start with:
var port = 9494;
var server = require('webserver').create();
var page = require('webpage').create();
var your_method = function(data) {
# Do stuff here
};
service = server.listen(port, function (request, response) {
var input = JSON.parse(request.post);
page.open(url, function (status) {
page.evaluate(your_method, input)
});
if (service) {
console.log('Server running on port ' + port);
} else {
console.log('Error: Could not create web server listening on port ' + port);
phantom.exit();
}
From the documentation;
This is intended for ease of communication between PhantomJS scripts and the outside world and is not recommended for use as a general production server. There is currently a limit of 10 concurrent requests; any other requests will be queued up.
Upvotes: 1
Reputation: 473903
For every request(each URL), I initialize a new web driver with selenium/phantomJS . Is this approach unscalable and will it cost a lot of CPU usage over time?
This is definitely a problem. PhantomJS
instances are usually heavy on CPU and it is not a reliable way to scale. If you can reuse the same "webdriver" instance without problems or a negative impact on the performance, do it. If not, look into making a Selenium grid
with multiple selenium nodes - workers that would actually have browser instances running. You can also look into using remote selenium servers, like BrowserStack
or Sauce Labs
.
Upvotes: 3