everreadyeddy
everreadyeddy

Reputation: 748

Jsoup not getting full html

I am trying to Jsoup to parse the html from the URL http://www.threadflip.com/shop/search/john%20hardy

Jsoup looks to only get the data from the line

<![CDATA[ window.gon= ..............

Does anyone know why this would be?

Document doc = Jsoup.connect("http://www.threadflip.com/shop/search/john%20hardy").get();

Upvotes: 0

Views: 690

Answers (1)

luksch
luksch

Reputation: 11712

The site you try to parse loads most of its contents async via AJAX calls. JSoup does not interpret Javascript and therefore does not act like a browser. It seems that the store is filled by calling their api:

http://www.threadflip.com/api/v3/items?attribution%5Bapp%5D=web&item_collection_id=&q=john+hardy&page=1&page_size=30

So maybe you need to directly load the API Url in order to read the stuff you want. Note that the response is JSON, not HTML, so the JSoup html parser is of not much help here. But there is great JSON libraries available. I use JSON-Simple.

Alternatively, you may switch to Selenium webdriver, which actually remote controls a real browser. This should have no trouble accessing all items from the page.

Upvotes: 1

Related Questions