Alex Encore
Alex Encore

Reputation: 309

Regular expressions for HTML

I am trying to find the following regular expressions to implement to a program of mine to parse a given html file. Could you help me with any of those?

<div>
<div class=”menuItem”> 
<span> 
class=”emph” 
Any string beginning with < and ending with >, i.e. all tags. 
The contents of the body tag.
The contents of all divs 
All divs that make menus

I have managed to figure out that the single div tag is simply " < div >" and the "all tags expression is <(\"[^\"]*\"|'[^']*'|[^'\">])*>

Do you think you could help me with any of the rest? Thank you in advance guys...

I know that HTML parsing is an already solved problem and that regex is not efficient, however it is requested that I do this like this, in order to demonstrate how regular expressions can work by making them (sometimes) long and detailed. That's why I'm simply handling the HTML file I have as a simple text file and I need to apply those regular expressions on it.

Upvotes: 1

Views: 266

Answers (1)

D_Bye
D_Bye

Reputation: 909

Please, for your own sanity, consider using an HTML parser library for the language you are using. Regexps are not a suitable tool for this application - they cannot reliably or cleanly handle structured data like HTML.

https://stackoverflow.com/a/1732454/457201

Upvotes: 4

Related Questions