Hendra Anggrian
Hendra Anggrian

Reputation: 5848

get inner tags from HTML

I'm new to JSoup so a help would be great. In this official tutorial, we could learn that code below gets inner tag as String:

String html = "<p>An <a href='http://example.com/'><b>example</b></a> link.</p>";
Document doc = Jsoup.parse(html);
Element link = doc.select("a").first();

String text = doc.body().text(); // "An example link"
String linkHref = link.attr("href"); // "http://example.com/"
String linkText = link.text(); // "example""

String linkOuterH = link.outerHtml(); 
    // "<a href="http://example.com"><b>example</b></a>"
String linkInnerH = link.html(); // "<b>example</b>"

But what if the HTML string is extremely long and therefore contains a lot of tags? Is there a limitation in this case? How could I make an array or arraylist of String containing all inner tags of "a"

?

Upvotes: 0

Views: 388

Answers (2)

Syam S
Syam S

Reputation: 8499

Jsoup is similar to a DOM parser. It paser the entire html to a tree structure. So the size that it could parse depends on the java heap size you configured.

And as for getting a tag there are several ways. Easiest one would be document.select() method. Just like Masud's answer.

Document document = Jsoup.parser(html);
List<String> tags = new ArrayList<String>();
for(Element e : document.select("a")){
    tags.add(e.tagName());
}
System.out.println("The tags = " + tags);

//If you want it as array
String[] tagsArray = tags.toArray(new String[tags.size()]);

You can refer to this answer for more option How to Count total Html Tags using Jsoup

Upvotes: 1

Masudul
Masudul

Reputation: 21961

How could I make an array or arraylist of String containing all inner tags of "a"

You can return Elements from doc. Here Elements is an array containing all <a> tag.

   Document doc = Jsoup.parse(html);
   Elements allAnchorTags = doc.select("a");

   System.out.println(allAnchorTags); // It will print all tag string.

Upvotes: 0

Related Questions