sdbbs
sdbbs

Reputation: 5384

Inspecting AJAX loaded JS objects/class with CasperJS?

I'm using the same example as in Checking JavaScript AJAX loaded resources with Mink/Zombie in PHP?:

test_JSload.php

<?php
if (array_key_exists("QUERY_STRING", $_SERVER)) {
  if ($_SERVER["QUERY_STRING"] == "getone") {
    echo "<!doctype html>
  <html>
  <head>
  <script src='test_JSload.php?gettwo'></script>
  </head>
  </html>
  ";
    exit;
  }

  if ($_SERVER["QUERY_STRING"] == "gettwo") {
    header('Content-Type: application/javascript');
    echo "
  function person(firstName) {
    this.firstName = firstName;
    this.changeName = function (name) {
        this.firstName = name;
    };
  }
  ";
    exit;
  }
}
?>
<html>
<head>
  <meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
  <style type="text/css">
.my_btn { background-color:yellow; }
  </style>
  <script src="http://code.jquery.com/jquery-1.12.4.min.js"></script>
  <script type="text/javascript">
var thishref = window.location.href.slice(0, window.location.href.indexOf('?')+1);
var qstr = window.location.href.slice(window.location.href.indexOf('?')+1);

function OnGetdata(inbtn) {
  console.log("OnGetdata; loading ?getone via AJAX call");
  //~ $.ajax(thishref + "?getone", { // works
  var ptest = {}; // init as empty object
  console.log(" ptest pre ajax is ", ptest);

  $.ajax({url: thishref + "?getone",
    async: true, // still "Synchronous XMLHttpRequest on the main thread is deprecated", because we load a script; https://stackoverflow.com/q/24639335
    success: function(data) {
      console.log("got getone data "); //, data);
      $("#dataholder").html(data);
      ptest = new person("AHA");
      console.log(" ptest post getone is ", ptest);
    },
    error: function(xhr, ajaxOptions, thrownError) {
      console.log("getone error " + thishref + " : " + xhr.status + " / " + thrownError);
    }
  });

  ptest.changeName("Somename");
  console.log(" ptest post ajax is ", ptest);
}

ondocready = function() {
  $("#getdatabtn").click(function(){
    OnGetdata(this);
  });
}
$(document).ready(ondocready);
  </script>
</head>


<body>
  <h1>Hello World!</h1>

  <button type="button" id="getdatabtn" class="my_btn">Get Data!</button>
  <div id="dataholder"></div>
</body>
</html>

Then, you can just run a temporary server with PHP > 5.4 CLI (command line), in the same directory (of the .php file):

php -S localhost:8080

... and then finally, you can visit the page at http://127.0.0.1:8080/test_JSload.php.

Simply speaking, in this page, when the button is clicked, JavaScript class is loaded in two passes - first an HTML comes in with a <script> tag, whose script would then get loaded in the second pass. Firefox for this action prints in Console:

OnGetdata; loading ?getone via AJAX call      test_JSload.php:13:3
 ptest pre ajax is  Object {  }               test_JSload.php:16:3
TypeError: ptest.changeName is not a function test_JSload.php:31:3
got getone data                               test_JSload.php:21:7
Synchronous XMLHttpRequest on the main thread is deprecated because of its detrimental effects to the end user's experience. For more help http://xhr.spec.whatwg.org/ jquery-1.12.4.min.js:4:26272
 ptest post getone is  Object { firstName: "AHA", changeName: person/this.changeName(name) } test_JSload.php:24:7

I would ultimately like to inspect either ptest variable or the person class in CasperJS. So far, I made this script:

test_JSload_casper.js

// run with:
// ~/.nvm/versions/node/v4.0.0/lib/node_modules/casperjs/bin/casperjs test_JSload_casper.js
// based on http://code-epicenter.com/how-to-login-to-amazon-using-casperjs-working-example/

var casper = require('casper').create({
  pageSettings: {
    loadImages: false,//The script is much faster when this field is set to false
    loadPlugins: false,
    userAgent: 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36'
  }
});

//First step is to open page
casper.start().thenOpen("http://127.0.0.1:8080/test_JSload.php", function() {
  console.log("website opened");
});

//Second step is to click to the button
casper.then(function(){
   this.evaluate(function(){
    document.getElementById("getdatabtn").click();
   });
});

//Wait for JS to execute?!, then inspect
casper.then(function(){
  console.log("After login...");
  console.log("AA " + JSON.stringify(person));
});

casper.run();

... however, when I run this CasperJS script, I get just:

$ ~/.nvm/versions/node/v4.0.0/lib/node_modules/casperjs/bin/casperjs test_JSload_casper.js
website opened
After login...

... and nothing else. Note that the last line console.log("AA " + JSON.stringify(person)); doesn't execute even partially (i.e., no "AA " is printed, nor any sort of error message).

So, is it possible to use Casper JS to inspect resources like these (AJAX loaded JS objects/classes, possibly loaded over multiple runs/steps) - and if so, how?

Upvotes: 1

Views: 574

Answers (2)

Artjom B.
Artjom B.

Reputation: 61892

The Ajax request which is triggered through the click might not have enough time to make an impact on the page you're scraping. Make sure to wait for it's completion with one of the many wait* functions. If the DOM is changed as a result of the Ajax request, then I suggest waitForSelector.

A related problem is that the page's JavaScript is broken. Since the Ajax request that populates ptest is asynchronous, ptest.changeName("Somename") is executed before the response arrived and thus leads to a TypeError. You can move ptest.changeName(...) to the success callback of the Ajax request.

In order to see console messages from the page, you have to listen to the 'remote.message' event:

casper.on("remote.message", function(msg){
    this.echo("remote> " + msg);
});

casper.start(...)...

Upvotes: 1

sdbbs
sdbbs

Reputation: 5384

I'll post this as a partial answer, as at least I managed to print the person class - the trick is to use casper.evaluate to run the script (i.e. a console.log(person)) as if at the remote page (see below). However, there are still issues unclear to me (and I'll gladly accept the answer that clarifies that):

  • The person class should only be existing after the ?gettwo request has completed, and the corresponding JS has been retrieved; however, casperjs reports only that calls to ?getone were made, not to ?gettwo ??! Why?
  • If I try to use JSON.stringify(person) or __utils__.echo('plop'); in the final .then(..., then the script execution is interrupted, as if there was a fatal error - however, no related error is reported, even if I listen to multiple messages; why?

Otherwise, here is the modified test_JSload_casper.js file:

// run with:
// ~/.nvm/versions/node/v4.0.0/lib/node_modules/casperjs/bin/casperjs test_JSload_casper.js

var casper = require('casper').create({
  verbose: true,
  logLevel: 'debug',
  userAgent: 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36',
  pageSettings: {
    loadImages: false,//The script is much faster when this field is set to false
    loadPlugins: false
  }
});


casper.on('remote.message', function(message) {
  this.echo('remote message caught: ' + message);
});

casper.on('resource.received', function(resource) {
  var status = resource.status;
  casper.log('Resource received ' + resource.url + ' (' + status + ')');
});

casper.on("resource.error", function(resourceError) {
  this.echo("Resource error: " + "Error code: "+resourceError.errorCode+" ErrorString: "+resourceError.errorString+" url: "+resourceError.url+" id: "+resourceError.id, "ERROR");
});

casper.on("page.error", function(msg, trace) {
  this.echo("Page Error: " + msg, "ERROR");
});

// http://docs.casperjs.org/en/latest/events-filters.html#page-initialized
casper.on("page.initialized", function(page) {
  // CasperJS doesn't provide `onResourceTimeout`, so it must be set through
  // the PhantomJS means. This is only possible when the page is initialized
  page.onResourceTimeout = function(request) {
    console.log('Response Timeout (#' + request.id + '): ' + JSON.stringify(request));
  };
});


//Second step is to click to the button
casper.then(function(){
   this.evaluate(function(){
    document.getElementById("getdatabtn").click();
   });
   //~ this.wait(2000, function() { // fires, but ?gettwo never gets listed
    //~ console.log("Done waiting");
   //~ });

  //~ this.waitForResource(/\?gettwo$/, function() { // does not ever fire: "Wait timeout of 5000ms expired, exiting."
    //~ this.echo('a gettwo has been loaded.');
  //~ });
});

//Wait for JS to execute?!, then inspect
casper.then(function(){
  console.log("After login...");

  // Code inside of this function will run
  // as if it was placed inside the target page.
  casper.evaluate(function(term) {
    //~ console.log("EEE", ptest); // Page Error: ReferenceError: Can't find variable: ptest
    console.log("EEE", person); // does dump the class function
  });

  __utils__.echo('plop'); // script BREAKS here....
  console.log("BB ");
  console.log("AA " + JSON.stringify(person));
});

casper.run();

The output of this is:

$ ~/.nvm/versions/node/v4.0.0/lib/node_modules/casperjs/bin/casperjs test_php_mink/test_JSload_casper.js 
[info] [phantom] Starting...
[info] [phantom] Running suite: 4 steps
[debug] [phantom] opening url: http://127.0.0.1:8080/test_JSload.php, HTTP GET
[debug] [phantom] Navigation requested: url=http://127.0.0.1:8080/test_JSload.php, type=Other, willNavigate=true, isMainFrame=true
[debug] [phantom] Resource received http://127.0.0.1:8080/test_JSload.php (200)
[debug] [phantom] url changed to "http://127.0.0.1:8080/test_JSload.php"
[debug] [phantom] Resource received http://127.0.0.1:8080/test_JSload.php (200)
[debug] [phantom] Resource received http://code.jquery.com/jquery-1.12.4.min.js (200)
[debug] [phantom] Resource received http://code.jquery.com/jquery-1.12.4.min.js (200)
[debug] [phantom] Successfully injected Casper client-side utilities
[info] [phantom] Step anonymous 2/4 http://127.0.0.1:8080/test_JSload.php (HTTP 200)
website opened
[info] [phantom] Step anonymous 2/4: done in 312ms.
[info] [phantom] Step anonymous 3/4 http://127.0.0.1:8080/test_JSload.php (HTTP 200)
remote message caught: OnGetdata; loading ?getone via AJAX call
remote message caught:  ptest pre ajax is  [object Object]
Page Error: TypeError: undefined is not a function (evaluating 'ptest.changeName("Somename")')
[info] [phantom] Step anonymous 3/4: done in 337ms.
[debug] [phantom] Resource received http://127.0.0.1:8080/test_JSload.php?getone (200)
[debug] [phantom] Resource received http://127.0.0.1:8080/test_JSload.php?getone (200)
remote message caught: got getone data 
remote message caught:  ptest post getone is  [object Object]
[info] [phantom] Step anonymous 4/4 http://127.0.0.1:8080/test_JSload.php (HTTP 200)
After login...
remote message caught: EEE function person(firstName) {
    this.firstName = firstName;
    this.changeName = function (name) {
        this.firstName = name;
    };
  }
[debug] [phantom] Navigation requested: url=about:blank, type=Other, willNavigate=true, isMainFrame=true
[debug] [phantom] url changed to "about:blank"

As it can be seen from the "EEE" message, the person class (function) is reported correctly - even if http://127.0.0.1:8080/test_JSload.php?gettwo (which defines it) is never listed as a loaded resource..

Upvotes: 0

Related Questions