Reputation: 18676
I've been looking for this answer on the site, but all the answers seems to be jQuery specific.
I'm building a scraper using a Casper JS, and I can't find the right method to select the values of a column n, where n is an arbitrary number specified by me.
I'm selecting the table specifically this way:
document.querySelector('table.table-responsive.table-noborder');
and I get back a TableElement but from there, I don't know how to move forward to get the contents of a specific column, without having to iterate the whole table (which is what I'm ultimately doing to get the data)
Thanks!
Upvotes: 0
Views: 980
Reputation: 61912
You can write your own function that plugs nicely into casper. This iterates over the nth td
or th
in all the rows and writes the value of the innerText property into a result array:
casper.tableColumnText = function(tableSelector, columnNumber, withHeader, merged){
// columnNumber starts with 1
var texts = this.evaluate(function(tableSelector, columnNumber, withHeader){
var headerFields = document.querySelectorAll(tableSelector + " > thead > tr > th:nth-child("+columnNumber+")"),
bodyFields = document.querySelectorAll(tableSelector + " > tbody > tr > td:nth-child("+columnNumber+")"),
result = [];
if (withHeader) {
Array.prototype.forEach.call(headerFields, function(headerField){
result.push(headerField.innerText);
});
}
Array.prototype.forEach.call(bodyFields, function(bodyField){
result.push(bodyField.innerText);
});
return result;
}, tableSelector, columnNumber, withHeader);
if (merged) {
return texts.join(' ');
}
return texts;
};
tbody
will be injected by the browser (phantom) even if it is not present in the original markup.
Upvotes: 1
Reputation: 489
You can use document.getElementsByTagName('table')[m]
to reach your relevant m'th index table.
Same way, you can traverse inside the table, and get the text content by innerHTML or nodeValue
document.getElementsByTagName('table')[2].getElementsByTagName('tr')[1].childNodes[0].innerHTML
document.getElementsByTagName('table')[2].getElementsByTagName('tr')[1].childNodes[0].nodeValue
Upvotes: 3