Reputation: 11
I'm looking how I can extract the content of news articles like CNN or NewYork times using Jsoup
.
In fact I had tried the following code:
Document document = Jsoup.connect("http://edition.cnn.com/2013/11/10/world/asia/philippines-typhoon-haiyan/index.html").get();
Element contents = document.select("#content").first();
System.out.println(contents.html());
System.out.println(contents.text());
I had received this error:
Exception in thread "main" java.lang.NullPointerException
at com.clearforest.Test.main(Test.java:36)
Have you an idea please How I can extract a proper text from articles.
Upvotes: 0
Views: 277
Reputation: 2371
Your contents Element
is null after the select
call - the selector you specified returns no matches in the document downloaded from CNN - try something like document.select("div.cnn_strycntntlft")
which returns the story div contents.
Upvotes: 1