Reputation: 10348
I have a string which contain a DIV tag to remove.
I can recognize the DIV to remove by its parameters (the specific style in this case) that is unique. This DIV contains a lot of HTML inside including other DIVs.
<div style="padding-top: 10px; clear: both; width: 100%;">
{ a lot other divs here}
</div>
How remove it from the string?
EDIT: (Any useful technique is welcome)
EDIT 2: I know about the inconvenience of using ergualr expressions. If you have a solution using regexs
is welcome too because is a one-stop parsing process ans the text is very small and the HTML is well-construted (Indeed is XHTML).
EDIT 3: If possible please show an example using a HTML/DOM parser or Xpath
or whatever. The problem here is not select data else remove data. Can be done with HTML/DOM parser or Xpath
?
Upvotes: 0
Views: 200
Reputation: 2111
XPath is easiest and it works with JQuery. Check on the reference. http://saxon.sourceforge.net/saxon6.5/expressions.html
Since it's based on location(path), you can specify how deep you want to go like how you work with file paths.
You can try stuffs like //{Tag above div}/div
This is different from //div because // doesn't care where to start, it will get all the Divs anywhere in the doc, so your starting tag after // gotta be unique. You can even start from //html and just / down through the DOM tree like entering an address if you want. There shouldn't be that many levels between html and your first div.
Upvotes: 0
Reputation: 1387
Remember that HTML is not a regular language, so it is not possible to parse it using regular expressions. I would recommend using an HTML parser.
You can read more about regular languages here: http://en.wikipedia.org/wiki/Regular_language, and on the Chomsky language classification here: http://en.wikipedia.org/wiki/Chomsky_hierarchy
Upvotes: 1