Geert
Geert

Reputation: 355

Load a HTML page including generated HTML

Google seems to be failing me today: I'm looking for a way to load a remote html page into my Java application. This HTML page contains some JavaScript that generates most of the content. Now I thought it would be fairly straightforward to open the page in Java and have a look at the HTML.

When I use URL.openStream() to read the file, I get the HTML source with JavaScript and without the generated HTML (which is what I would expect). So how do i get from this to the HTML source including the generated content? I thought it would be fairly straightforward, but after a few hours on Google, I get completely entangled in Rhino, EnvJs, Jsoup, but it's not really getting me anywhere.

Does anyone have any suggestions?

Upvotes: 1

Views: 271

Answers (1)

Pixou
Pixou

Reputation: 1769

Yes, basically there is no easy solution, as you need to actually render the page, so you need a javascript engine (as feeela says).

One solution is to use webkit. I haven't used it in Java, but in Python. You may look at WebKit browser in Java app on multiple platforms

Upvotes: 2

Related Questions