fearless_fool
fearless_fool

Reputation: 35189

extracting table elements with CasperJS

I'd like to use CasperJS to extract successive (inner) HTML fields from a table into a list. I know it's easy to extract successive element attributes from a table, but I can't figure out how to extract successive HTML fields.

To demonstrate, here's a simple HTML table:

<html>
  <head></head>
  <body>
    <table>
      <tbody>
        <tr><td name="a">1</td><td>hop</td></tr>
        <tr><td name="b">2</td><td>skip</td></tr>
        <tr><td name="c">3</td><td>jump</td></tr>
      </tbody>
    </table>
  </body>
</html>

And here's a complete casper program to extract bits from the table:

"use strict";
var casper = require('casper').create();

casper.start('file:///tmp/casper-so.html');

// I want to print the list '["a", "b", "c"]'
casper.then(function a1() {
    var names = casper.getElementsAttribute('table tr td[name]', 'name');
    // prints ["a", "b", "c"] as desired...
    console.log(JSON.stringify(names, null, 2));
});

// I want to print the list '["hop", "skip", "jump"]'
casper.then(function a2() {
    var verbs = ???;
    // What should go on the previous line in order to print 
    // ["hop", "skip", "jump"]?
    console.log(JSON.stringify(verbs, null, 2));
});

casper.run();

As commented in the code, I know how to extract all the td[name] fields using casper.getElementsAttribute(). But I haven't figured out a straightforward way to extract the inner HTML from a given column in the table. Any pointers?

Aside: What I've been doing is extracting elements one at a time, iterating with an index, using CSS that looks like table tr:nth-child(' + index + ') td:nth-child(2), but that feels rather tortured. I'm hoping to find something simpler.

Upvotes: 2

Views: 2333

Answers (2)

Aris
Aris

Reputation: 5057

Another solution is to get the td info object, and then the text from the object:

//get hop - 2nd td in DOM
var tdObject = this.getElementInfo('tr td:nth-of-type(2)');
tdTwoObjectText = tdObject.text.trim();

//get skip - 4th td in DOM
var tdObject = this.getElementInfo('tr td:nth-of-type(4)');
tdFourObjectText = tdObject.text.trim();

//get jump - 6th td in DOM
var tdObject = this.getElementInfo('tr td:nth-of-type(6)');
tdSixObjectText = tdObject.text.trim();

Upvotes: 0

fearless_fool
fearless_fool

Reputation: 35189

Here's a solution, cribbed heavily from casper's definition of getElementsAttribute():

// print the list '["hop", "skip", "jump"]'
casper.then(function a2() {
    var verbs = casper.evaluate(function () {
        return [].map.call(__utils__.findAll('table tr td:nth-child(2)'), function (e) { return e.innerHTML; });
    });
    console.log(JSON.stringify(verbs, null, 2));
});

Upvotes: 5

Related Questions