ILikeTacos
ILikeTacos

Reputation: 18676

How to scrape the values of the n-th column of a table with pure JS?

I've been looking for this answer on the site, but all the answers seems to be jQuery specific.

I'm building a scraper using a Casper JS, and I can't find the right method to select the values of a column n, where n is an arbitrary number specified by me.

I'm selecting the table specifically this way:

document.querySelector('table.table-responsive.table-noborder');

and I get back a TableElement but from there, I don't know how to move forward to get the contents of a specific column, without having to iterate the whole table (which is what I'm ultimately doing to get the data)

Thanks!

Upvotes: 0

Views: 980

Answers (2)

Artjom B.
Artjom B.

Reputation: 61912

You can write your own function that plugs nicely into casper. This iterates over the nth td or th in all the rows and writes the value of the innerText property into a result array:

casper.tableColumnText = function(tableSelector, columnNumber, withHeader, merged){
    // columnNumber starts with 1
    var texts = this.evaluate(function(tableSelector, columnNumber, withHeader){
        var headerFields = document.querySelectorAll(tableSelector + " > thead > tr > th:nth-child("+columnNumber+")"),
            bodyFields = document.querySelectorAll(tableSelector + " > tbody > tr > td:nth-child("+columnNumber+")"),
            result = [];
        if (withHeader) {
            Array.prototype.forEach.call(headerFields, function(headerField){
                result.push(headerField.innerText);
            });
        }
        Array.prototype.forEach.call(bodyFields, function(bodyField){
            result.push(bodyField.innerText);
        });
        return result;
    }, tableSelector, columnNumber, withHeader);
    if (merged) {
        return texts.join(' ');
    }
    return texts;
};

tbody will be injected by the browser (phantom) even if it is not present in the original markup.

Upvotes: 1

Avnish Gaur
Avnish Gaur

Reputation: 489

You can use document.getElementsByTagName('table')[m] to reach your relevant m'th index table.

Same way, you can traverse inside the table, and get the text content by innerHTML or nodeValue

document.getElementsByTagName('table')[2].getElementsByTagName('tr')[1].childNodes[0].innerHTML
document.getElementsByTagName('table')[2].getElementsByTagName('tr')[1].childNodes[0].nodeValue

Upvotes: 3

Related Questions