Reputation: 18666
I'm building a link scraper in CasperJS, and the main functions looks pretty much like this:
function findLinks() {
return Array.prototype.map.call(document.querySelectorAll('a'), function(e){
return {
href: e.href,
title: e.title,
rel: e.rel,
anchor: e.text,
innerHTML: e.innerHTML
};
});
}
However, I'd like to modify findLinks()
in a way that if my link scraper finds something like this:
<a href="#" title="anchor tag" rel="nofollow"><img src="myimage.jpg" alt="beautiful image" /></a>
I can access <img>
attributes individually, just as I do it with the links.
I've been reading Mozilla MDN, and CasperJS and I haven't found yet a way to achieve this,
Any help will be greatly appreciated!
Upvotes: 0
Views: 94
Reputation: 2930
Document Object Model (DOM) API is what you are looking for. Here is the site that I find useful for DOM documentation
in your instance e.childNodes[n].attributes['href']
would be an example.
But, better yet, if you are using extreme html tree traversing, my suggestion is to use jQuery. It is made exactly for your purpose.
Upvotes: 0
Reputation: 276266
You're looking for Element.children
children returns a collection of child elements of the given element.
In your example HTML:
var b = document.querySelectorAll('a')[0];
alert(b.children[0].src); //First child's source: myimage.jpg
Upvotes: 1