techiepark
techiepark

Reputation: 343

How to parse following String present in HTML and build DOM Tree in Java?

I have below string in html and I want to build Dom tree and get name value pair. How i can do this using html parser or xml parser or REGEXP. any code snippet will be useful. Thanks



<$$TagStarts>

<==0>Name0</==0><##0>Value0</##0>
<==1>Name1</==1><##1>Value1</##1>
<==2>Name2</==2><##2>Value2</##2>
<==3>Name3</==3><##3>Value3</##3>
<==4>Name4</==4><##4>Value4</##4>
<==5>Name5</==5><##5>Value5</##5>

</$$TagStarts>


Upvotes: 0

Views: 575

Answers (1)

Favonius
Favonius

Reputation: 13974

Assuming the tag names are just for sample.... and you will have some meaningful tag names...

Try using any of the following HTML parsers...

http://home.ccil.org/~cowan/XML/tagsoup/

http://nekohtml.sourceforge.net/

http://jtidy.sourceforge.net/

They will give you the W3 compliant document object.... After this it is just a game of getElementsByTagName or getElementById or Use XPath or Xquery to get the elements from the DOM.

Otherwise you can use the following... They have their own document object implementation...

http://htmlcleaner.sourceforge.net/ [It also has some basic XPath support]

http://jsoup.org/ [It has jquery like query API]

ADD Check this... http://jsoup.org/cookbook/extracting-data/selector-syntax

I will recommend ... Either JSoup or Nekohtml

Upvotes: 3

Related Questions