retrieving the text of an element in jsoup

Question

When I was using jsoup to parse some html files like "google.com" I encountered with a problem in retreiving the text of an element.

For example in this div element using the text function, the words "Programs" and "Business" are attached to each other which I think it's not right:


   Advertising Programs
   Business Solutions
   +Google
   About Google

You can test my claim with this code:

URL url = new URL("http://www.google.com");
Document document = Jsoup.parse(url, 10000);
Element element = document.select("div[id=fll]").first();
System.out.println(element.text());

Output will be:

Advertising ProgramsBusiness Solutions+GoogleAbout Google

I want to know that can anything to be done about it?

By the way I traced the code and found out that the problem will be corrected by adding this line:

textNode.text(textNode.text() + " ");

between the lines 755 and 756 of the Element class of the nodes package of the jsoup source code.

Also this problem exists in Elements class of the select package and probably in other text functions!

retrieving the text of an element in jsoup

Answers (1)

Related Questions