How to retrieve "just text" from the document with JavaScript in CasperJS

Question

I want to know how to choose plain text in body with JavaScript. It doesn't have name, id, tag, but just text. Body doesn't have name, id too.

How can I select this text with JavaScript in CasperJS?

Here is site html



site title


I don't want to scraping here

        TOP  一つ戻る
    
    I don't want to scraping here too　abcdef
    ***"
        2015年07月16日 10時50分時点" <---------I want to scrape here!!!!***
    

    ..
    

    
        TOP  一つ戻る
    
    
    (c)company name

and here is my code

var casper = require('casper').create({
clientScripts: ["includes/jquery-2.1.3.min.js"],
verbose: true,
logLevel: 'debug',
pageSettings: {
    webSecurityEnabled: false
}
});
var fs = require('fs'); 
var rli;
var result = null;
var pattern = /<[^>]+>/g;
var rui;
var list;
var result;



casper.start();

casper.then(function() {
    var current = 1;
    var end = 2;

    for (;current < end;) {

      (function(cntr) {

        casper.thenOpen('http://site/0'+cntr+'/' , function() {
              this.echo('casper.async: '+cntr+casper.getCurrentUrl());
              // here we can download stuff

             lists = this.evaluate(function () { 

        var elements = document.querySelectorAll('ui'); // scraping ui is okay


        result= Array.prototype.map.call(elements, function (element) {
            return element.innerText + ' [ ***here I want to save the upper date data*** ]'; // 

        });
        return result;
    });

    this.echo(lists.length); 
    this.echo(lists.join('
')); 

             // casper.capture( 'capture'+cntr+'.png' );

              fs.write('results'+cntr+'.txt', lists); 
        });
      })(current);

      current++;
    }
});

casper.run(function() {
    this.echo('Done.').exit();
});

Artjom B. · Accepted Answer

Let's identify what this is. It is a text node inside of a div container. You won't get far with CSS selectors, because they only work on actual elements and not TextNodes.

With XPath

You could use a plain JavaScript way to iterate over those elememts until you get to the TextNode, but I prefer XPath expressions. CasperJS provides a helper utility for them:

var x = require('casper').selectXPath;
...
var text = casper.fetchText(x("//body/div[@align='right']/h3/following-sibling::node()[1]"));
casper.echo(text);

The expression is mostly self-explanatory. The first part (//body/div[@align='right']/h3) matches the

elements directly before the text that you want to retrieve. `following-sibling::node()[1]` is a little more tricky. `following-sibling::node()` will match all nodes of type `node()` (which a TextNode is of) after the current node (h3). `[1]` will only take the first one from that.

You can do the same thing with `//ui/preceding-sibling::node()[1]`.

With JavaScript

You can do the same thing with JavaScript:

var text = casper.evaluate(function(){
    return document.querySelector("body > div[align='right'] > h3").nextSibling.textContent;
});

or

var text = casper.evaluate(function(){
    return document.querySelector("ui").previousSibling.textContent;
});

How to retrieve "just text" from the document with JavaScript in CasperJS

Answers (1)

With XPath

With JavaScript

Related Questions

How to retrieve &quot;just text&quot; from the document with JavaScript in CasperJS

Answers (1)

With XPath

With JavaScript

Related Questions

How to retrieve "just text" from the document with JavaScript in CasperJS