Reputation: 873
I am trying to parse this page.
http://www.reuters.com/article/2015/07/08/us-china-cybersecurity-idUSKCN0PI09020150708
My code looks like this
WebClient webClient = new WebClient(BrowserVersion.CHROME);
final HtmlPage page = webClient.getPage("http://www.reuters.com/article/2015/07/08/us-alibaba-singapore-post-idUSKCN0PI03J20150708");
System.out.println(page.asXml());
It gives me a lot of warnings and a huge call stack. Mostly related to javascript engine. I have used these options
webClient.waitForBackgroundJavaScript(1000000);
webClient.setJavaScriptTimeout(1000000);
But nothing seems to work. This page executes javascript to load the content of the page. I need to wait for the page to load to get the content. Any ideas how I can resolve this issue?
Upvotes: 0
Views: 592
Reputation: 5559
You need to wait
just after getting the page, also there is an error of "addImpression" is not defined
, I don't know in which JavaScript it is defined.
I feel like you are not using recent version, since there are not lot of warnings.
With latest snapshot I get the content by using:
try (WebClient webClient = new WebClient(BrowserVersion.CHROME)) {
webClient.getOptions().setThrowExceptionOnScriptError(false);
final HtmlPage page = webClient.getPage("http://www.reuters.com/article/2015/07/08/us-alibaba-singapore-post-idUSKCN0PI03J20150708");
webClient.waitForBackgroundJavaScript(10000);
System.out.println(page.asText());
}
Upvotes: 3