Gilad
Gilad

Reputation: 3

Jsoup - Trying to extract Comments number from web page

I'm trying to extract the overall comments number from a web page using Jsoup. For example, here is a page (CNN): http://edition.cnn.com/2011/POLITICS/07/31/debt.talks/index.html?hpt=T1

I see that the class ID is cnn_strycmtsndff, but can't get to find the right command to extract it.

Can someone help?

Thanks

Upvotes: 0

Views: 553

Answers (1)

Aaron Foltz
Aaron Foltz

Reputation: 138

Unfortunately, I don't think Jsoup is going to cut it. If you use the Chrome developer tools you can clearly pick out the HTML used for presenting the "(##### Comments)" section, but if you just view the source, none of that information is there. It seems like they are using some Javascript to dynamically embed the information in the page.

This is what you see in "View Source":

<div id="disqus_thread"></div><script type="text/javascript" src="http://cnn.disqus.com/embed.js"></script>

So Jsoup will never be able to see the elements with the comment information.

Upvotes: 1

Related Questions