rutruth
rutruth

Reputation: 790

How can I fetch html tags in a HTML document

Hey I want to get tags from a html document.
That is everything that is contained within the angle brackets with the angle brackets inclusive. How can I do this in Java ? Thanks

Upvotes: 0

Views: 153

Answers (2)

Andreas Dolk
Andreas Dolk

Reputation: 114757

<!-- Read carefully -->
<b><![CDATA[<Everything in angle brackets ("<>") is a tag?>]]></b>

... and use an html parser.


If you want to do it manually, iterate over the input chars and decide for each and every < and > whether it belongs to a tag element or not. There are some rules (processing instructions, comments, CDATA content, angle brackets in attribute values(!)) to follow.

Most parsers use some switch/case pattern for evaluating each token (char in your case).

Upvotes: 3

bert
bert

Reputation: 7696

I used jsoup recently. Nice API, easy to use and no problems so far. Don 't even try to parse html yourself. See Andreas_D answer.

Upvotes: 2

Related Questions