Dinesh Kumar P
Dinesh Kumar P

Reputation: 1168

Crawl IFrame content of a weppage using java

I would like to crawl the IFrame content (dynamic content) of a webpage;

But as of now no crawlers (tried with Aperture,Crawl4j) support this; Result which I get is


      <iframe id="template_content_frame" src="/ee/mypage/default.htm" width="100%" frameborder="0" name="content_frame">
      </iframe>

So started with Crawljax. Does this support crawling IFrame contents? I came over this issue; It seems the above issue is Closed not Fixed, So I had a doubt that Crawljax supports this or not;

Do any one have tried this earlier / have any new solution to crawl dynamic content like IFrame ?

Upvotes: 1

Views: 658

Answers (1)

Pascal Essiembre
Pascal Essiembre

Reputation: 81

Norconex HTTP Collector is an open source enterprise web crawler that supports crawling of frame and iframe tags out of-the-box. You can also add your own set of tags to be used to extract URLs (e.g., frame.longdesc, video.src, form.action, etc.). You need no programming skills to use this crawler, but since you seem to know your Java, you can also plug your own URL-extraction logic if you prefer.

Once you get more familiar with this crawler, I suggest you look up the HtmlLinkExtractor class on the online javadoc for more URL-extracting options.

Upvotes: 1

Related Questions