Reputation: 343
I have below string in html and I want to build Dom tree and get name value pair. How i can do this using html parser or xml parser or REGEXP. any code snippet will be useful. Thanks
<$$TagStarts>
<==0>Name0</==0><##0>Value0</##0>
<==1>Name1</==1><##1>Value1</##1>
<==2>Name2</==2><##2>Value2</##2>
<==3>Name3</==3><##3>Value3</##3>
<==4>Name4</==4><##4>Value4</##4>
<==5>Name5</==5><##5>Value5</##5>
</$$TagStarts>
Upvotes: 0
Views: 575
Reputation: 13974
Assuming the tag names are just for sample.... and you will have some meaningful tag names...
Try using any of the following HTML parsers...
http://home.ccil.org/~cowan/XML/tagsoup/
http://nekohtml.sourceforge.net/
They will give you the W3 compliant document object.... After this it is just a game of getElementsByTagName
or getElementById
or Use XPath or Xquery to get the elements from the DOM.
Otherwise you can use the following... They have their own document object implementation...
http://htmlcleaner.sourceforge.net/ [It also has some basic XPath support]
http://jsoup.org/ [It has jquery like query
API]
ADD Check this... http://jsoup.org/cookbook/extracting-data/selector-syntax
I will recommend ... Either JSoup or Nekohtml
Upvotes: 3