Reputation: 790
Hey I want to get tags from a html document.
That is everything that is contained within the angle brackets with the angle brackets inclusive. How can I do this in Java ?
Thanks
Upvotes: 0
Views: 153
Reputation: 114757
<!-- Read carefully -->
<b><![CDATA[<Everything in angle brackets ("<>") is a tag?>]]></b>
... and use an html parser.
If you want to do it manually, iterate over the input chars and decide for each and every <
and >
whether it belongs to a tag element or not. There are some rules (processing instructions, comments, CDATA content, angle brackets in attribute values(!)) to follow.
Most parsers use some switch/case
pattern for evaluating each token (char in your case).
Upvotes: 3