Reputation: 265
I want to scrape HTML from websites like http://www3.mangafreak.net/Manga/One_Piece using Jsoup and HtmlUnit. Problem with websites like this is first it give
Status Code:503 Service Temporarily Unavailable
and then after few seconds it reloads the page with
Status Code:200 OK
Upvotes: 0
Views: 132
Reputation: 2889
Try this (HtmlUnit only)
WebClient webClient = new WebClient();
webClient.getOptions().setThrowExceptionOnFailingStatusCode(false);
HtmlPage page = (HtmlPage) webClient.getPage("http://www3.mangafreak.net/Manga/One_Piece");
System.out.println(page.asXml());
WebWindow window = page.getEnclosingWindow();
window.getJobManager().waitForJobsStartingBefore(5000);
page = (HtmlPage) window.getEnclosedPage();
System.out.println(page.asXml());
No you have the page and you can use the HtmlUnit API for having fun with the DOM tree or to click on something....
Upvotes: 1