Reputation: 21
I'm trying to parse the following page http://www.vermittlerregister.org with HTMLUnit. But the problem about it is, that I don't get the requested page. Instead I get the timeout page of that website which doesn't make any sense to me.
final WebClient webClient = new WebClient();
webClient.getPage("http://www.vermittlerregister.org");
The only warning I get by console is:
com.gargoylesoftware.htmlunit.IncorrectnessListenerImpl notify
WARNUNG: Obsolete content type encountered: 'application/x-javascript'.
This shouldn't cause the problem of getting a different page as requested.
By the way: If I'm using the standard JAVA Api java.net.URL I do get the correct page content.
Upvotes: 2
Views: 1026
Reputation: 5341
The page you're fetching has a refresh instruction - users get redirected to a timeout message after half an hour:
<meta id="ctl00_MetaRefresh" http-equiv="REFRESH"
content="1800;url=http://www.vermittlerregister.org:80//system/logout.aspx?timeout=true" />
HtmlUnit needs to decide whether to give you the current page, or the one the refresh is going to send you to. Its default behaviour is to follow all refresh instructions immediately (WebClient
uses an ImmediateRefreshHandler
). You can change this to a NiceRefreshHandler
instead, which lets you choose which refreshes to follow according to their delay times:
final WebClient webClient = new WebClient();
webClient.setRefreshHandler(new NiceRefreshHandler(5));
webClient.getPage("http://www.vermittlerregister.org");
This tells the WebClient
to refresh if the delay is 5 seconds or less, and will ignore the 30-minute refresh instruction on your page.
Upvotes: 2