user1699918
user1699918

Reputation: 11

Extract a proper content from the url using jsoup

I'm looking how I can extract the content of news articles like CNN or NewYork times using Jsoup.

In fact I had tried the following code:

Document document = Jsoup.connect("http://edition.cnn.com/2013/11/10/world/asia/philippines-typhoon-haiyan/index.html").get();

Element contents = document.select("#content").first();

System.out.println(contents.html()); 

System.out.println(contents.text()); 

I had received this error:

Exception in thread "main" java.lang.NullPointerException
at com.clearforest.Test.main(Test.java:36)

Have you an idea please How I can extract a proper text from articles.

Upvotes: 0

Views: 277

Answers (1)

drew
drew

Reputation: 2371

Your contents Element is null after the select call - the selector you specified returns no matches in the document downloaded from CNN - try something like document.select("div.cnn_strycntntlft") which returns the story div contents.

Upvotes: 1

Related Questions