Reputation: 1168
I would like to crawl the IFrame content (dynamic content) of a webpage;
But as of now no crawlers (tried with Aperture,Crawl4j) support this; Result which I get is
<iframe id="template_content_frame" src="/ee/mypage/default.htm" width="100%" frameborder="0" name="content_frame">
</iframe>
So started with Crawljax. Does this support crawling IFrame contents? I came over this issue; It seems the above issue is Closed not Fixed, So I had a doubt that Crawljax supports this or not;
Do any one have tried this earlier / have any new solution to crawl dynamic content like IFrame ?
Upvotes: 1
Views: 658
Reputation: 81
Norconex HTTP Collector is an open source enterprise web crawler that supports crawling of frame
and iframe
tags out of-the-box. You can also add your own set of tags to be used to extract URLs (e.g., frame.longdesc
, video.src
, form.action
, etc.). You need no programming skills to use this crawler, but since you seem to know your Java, you can also plug your own URL-extraction logic if you prefer.
Once you get more familiar with this crawler, I suggest you look up the HtmlLinkExtractor class on the online javadoc for more URL-extracting options.
Upvotes: 1