Reputation: 309
I am trying to find the following regular expressions to implement to a program of mine to parse a given html file. Could you help me with any of those?
<div>
<div class=”menuItem”>
<span>
class=”emph”
Any string beginning with < and ending with >, i.e. all tags.
The contents of the body tag.
The contents of all divs
All divs that make menus
I have managed to figure out that the single div tag is simply " < div >"
and the "all tags expression is <(\"[^\"]*\"|'[^']*'|[^'\">])*>
Do you think you could help me with any of the rest? Thank you in advance guys...
I know that HTML parsing is an already solved problem and that regex is not efficient, however it is requested that I do this like this, in order to demonstrate how regular expressions can work by making them (sometimes) long and detailed. That's why I'm simply handling the HTML file I have as a simple text file and I need to apply those regular expressions on it.
Upvotes: 1
Views: 266
Reputation: 909
Please, for your own sanity, consider using an HTML parser library for the language you are using. Regexps are not a suitable tool for this application - they cannot reliably or cleanly handle structured data like HTML.
https://stackoverflow.com/a/1732454/457201
Upvotes: 4