mudit
mudit

Reputation: 25536

Extract HTML content from GWT page

I want to parse the content of an HTML page written in GWT. I tried it to parse it using Jericho HTML content parser but the problem is that the page source does not have content. After doing some research on GWT pages, i came to know that GWT pages are written in JAVA and GWT compiler creates a complex structure of js pages from java code to display the HTML content on browser.

is there a way i can parse these type of pages?

Upvotes: 1

Views: 523

Answers (2)

HashimR
HashimR

Reputation: 3833

If the code is compiled in OBF - Obfuscated mode (code is usually compiled in this mode for production use) it will be VERY difficult, as JS files created are non-human readable.

This link might be helpful to make you understand GWT Compiler better.

EDIT:

Here you go. This might also be helpful. It is mentioned here how to De-obfuscate the Javascript.

EDIT2:

GWT-Penetration-Testing-Toolset - Check this tool.

Upvotes: 1

Thomas Broyer
Thomas Broyer

Reputation: 64541

Just like with (m)any "single-page web app" (including e.g. Twitter, which is not built with GWT), you have to run the JavaScript code and then scrape the DOM.

This can be easily (everything's relative) done using HtmlUnit, PhantomJS or similar tools.

Upvotes: 1

Related Questions