Sportalcraft
Sportalcraft

Reputation: 265

extracting Text non recursively with Jsoup

this is the code I'm trying to run :

String html = "<a href=\"/name/zola-1\">ZOLA <span class=\"tiny\">(1)</span></a>";

Document doc = Jsoup.parse(html); //connect  to the page
Element element = doc.getAllElements().first(); //recive the names elements

System.out.println(element.text()); //prints "ZOLA (1)"
System.out.println(element.ownText()); // prints nothing

my goal is to extract only "ZOLA", without the text of the children node, but ownText prints nothing... how should I do it?

Upvotes: 4

Views: 152

Answers (2)

Arvind Kumar Avinash
Arvind Kumar Avinash

Reputation: 79115

The problem is that doc.getAllElements().first() returns

<html>
 <head></head>
 <body>
  <a href="/name/zola-1">ZOLA <span class="tiny">(1)</span></a>
 </body>
</html>

while you expect

<a href="/name/zola-1">ZOLA <span class="tiny">(1)</span></a>

The following should work for you:

String html = "<a href=\"/name/zola-1\">ZOLA <span class=\"tiny\">(1)</span></a>";

Document doc = Jsoup.parse(html);
Elements links = doc.getElementsByTag("a");
System.out.println(links.get(0));
System.out.println(links.get(0).ownText());

Output:

<a href="/name/zola-1">ZOLA <span class="tiny">(1)</span></a>
ZOLA

Upvotes: 1

SternK
SternK

Reputation: 13071

You can use this:

String html = "<a href=\"/name/zola-1\">ZOLA <span class=\"tiny\">(1)</span></a>";
Document doc = Jsoup.parse(html);
Element elementA =  doc.selectFirst("a");
System.out.println(elementA.ownText()); // ZOLA

Upvotes: 1

Related Questions