user3347814
user3347814

Reputation: 1143

How to Handle redirects in Node.JS with HorsemanJs and PhantomJS

I´ve recently started using horseman.js to scrap a page with node. I can´t figure out how exactly it works and I can´t find good examples on the internet.

My main goal is to log on a platform and extract some data. I´ve managed to do this with PhantomJS, but know I want to learn how to do it with horseman.JS.

My code should open the login page, fill the login and password inputs and click on the "login" button. Pretty easy so far. However, after clicking on the "login" button the site makes 2 redirects before loading the actual page where I want to work.

My problem is that I don´t know how to make my code wait for that page.

With phantomJS I had a workaround with the page URL. The following code shows how I´ve managed to do it with phantomJS and it works just fine:

var page = require('webpage').create();

var urlHome = 'http://akna.com.br/site/montatela.php?t=acesse&header=n&footer=n';

var fillLoginInfo = function(){
    $('#cmpLogin').val('mylogin');
    $('#cmpSenha').val('mypassword');
    $('.btn.btn-default').click();
};

page.onLoadFinished = function(){

    var url = page.url;
    console.log("Page Loaded: " + url);

    if(url == urlHome){
        page.evaluate(fillLoginInfo);
        return;
    }

   // After the redirects the url has a "sid" parameter, I wait for that to apear when the page loads.
   else if(url.indexOf("sid=") >0){
    //Keep struggling with more codes!
    return;
}

}

page.open(urlHome);

However, I can´t find a way to handle the redirects with horseman.JS.

Here is what I´ve been trying with horseman.JS without any success:

var Horseman = require("node-horseman");
var horseman = new Horseman();

var urlHome = 'http://akna.com.br/site/montatela.php?t=acesse&header=n&footer=n';

var fillLoginInfo = function(){
  $('#cmpLogin').val('myemail');
  $('#cmpSenha').val('mypassword');
  $('.btn.btn-default').click();
}

var okStatus = function(){
  return horseman.status();
}

horseman
  .open(urlHome)
  .type('input[name="cmpLogin"]','myemail')
  .type('input[name="cmpSenha"]','mypassword')
  .click('.btn-success')
  .waitFor(okStatus, 200)
  .screenshot('image.png')
  .close();

How do I handle the redirects?

Upvotes: 0

Views: 1167

Answers (1)

MrDoughnut
MrDoughnut

Reputation: 116

I'm currently solving the same problem, and my best solution so far is to use the waitForSelector method to target something on the final page.

E.g.

horseman
  .open(urlHome)
  .type('input[name="cmpLogin"]','myemail')
  .type('input[name="cmpSenha"]','mypassword')
  .click('.btn-success')
  .waitForSelector("#loginComplete")
  .screenshot('image.png')
  .close();

Of course you have to know the page you're waiting for to do this.

If you know there are two redirects, you can use the approach of .waitForNextPage() twice. A naive approach if you didn't know how many redirects to expect would be to chain these until a timeout is reached (I don't recommend this as it will be slow!),

Perhaps a cleverer way, you can also use on events to capture redirects, like .on('navigationRequested') or .on('urlChanged').

Although it doesn't answer your question directly, this link may help: https://github.com/ariya/phantomjs/issues/11507

Upvotes: 1

Related Questions