Zoette
Zoette

Reputation: 1291

JSoup doesn't retrieve links on web page

I'm parsing a sitemap with JSoup.

Document dom = Jsoup.parse(new URL(pageRacine).openStream(), "UTF-8", "https://www.lavisducagou.nc/page-sitemap.xml");
Elements liens = dom.getElementsByTag("a");
System.out.println(liens.size() + " links have been retrieved");

The output:

0 links have been retrieved

I've also tried this but it doesn't work:

Document dom = Jsoup.parse(String.valueOf(new URL("https://www.lavisducagou.nc/page-sitemap.xml").openStream()), "", Parser.xmlParser());
liens = dom.select("a");

Can someone help me Am I crazy?

EDIT: System.out.println(dom.body()); outputs null.

Upvotes: 0

Views: 137

Answers (1)

Luk
Luk

Reputation: 2256

You don't get any links, because sitemap has no elements with tag a. Urls in sitemap are in tag loc. Use Elements liens = dom.getElementsByTag("loc");

You were probably mistaken by what you get in the browser. When using browser you get two request. One to download sitemap.xml and second to get main-sitemap.xsl, which contains information for the browser how to display the xml file.

Jsoup does not do that. Use System.out.println(dom.html()) to see how document downloaded by Jsoup looks like.

Use network tab in your browser to see what elements were downloaded, to display the data.

Upvotes: 1

Related Questions