Nikhil
Nikhil

Reputation: 1309

Retrieving contents of URL after they have been changed by javascript

I am facing a problem retrieving the contents of an HTML page using java. I have described the problem below.

  1. I am loading a URL in java which returns an HTML page.

  2. This page uses javascript. So when I load the URL in the browser, a javascript function call occurs AFTER the page has been loaded (onBodyLoad of HTML page) and it modifies some content (one of the div id's innerHtml) on the webpage. This change is obviously visible to me in the browser.

  3. Now, when I try to do the same thing using java, I only get the HTML content of the page , BEFORE the javascript call has occurred.

  4. What I want to do is, fetch the contents of the html page after the javascript function call has occurred and all this has to be done using java.

How can I do this? What should my approach be?

Upvotes: 1

Views: 427

Answers (2)

Nikhil
Nikhil

Reputation: 1309

For anyone reading this answer, Scott's answer above was a starting point for me. The Cobra project is long dead and cannot handle pages which use complex JavaScript.

However there is something called HTML Unit which does just exactly what I want.

Here is a small description:

HtmlUnit is a "GUI-Less browser for Java programs". It models HTML documents and provides an API that allows you to invoke pages, fill out forms, click links, etc... just like you do in your "normal" browser.

It has fairly good JavaScript support (which is constantly improving) and is able to work even with quite complex AJAX libraries, simulating either Firefox or Internet Explorer depending on the configuration you want to use.

It is typically used for testing purposes or to retrieve information from web sites.

Upvotes: 1

Scott
Scott

Reputation: 21511

You need to use a server side browser library that will also execute the JavaScript, so you can get the JavaScript updated DOM contents. The default browser mechanism doesn't do this, which is why you don't get the expected result.

You should try Cobra: Java HTML Parser, which will execute your JavaScript. See here for the download and for the documentation on how to use it.

Cobra:

It is Javascript-aware. DOM modifications that occur during parsing will be reflected in the resulting DOM. However, Javascript can be disabled.

Upvotes: 1

Related Questions