Reputation: 451
Lets say I need to get a string inside some h1, h2, or h3 tags
/<[hH][1-3][^>]*>(.*?)<\/[hH][1-3]>/
This works great if the user decides to take a sane approach to headers:
<h1>My Header</h1>
but knowing my users, they want bold, italic, underlined h1's. And they have that coding quagmire tinyMCE to help them do it. TinyMCE would output:
<h1><b><span style='text-decoration: underline'><i>My Hideous Header</i></span></b></h1>
So my question is:
How do i get a string inside h1 h2, or h3, and then inside any amount of surrounding other tags as well?
Thanks, Joe
Upvotes: 1
Views: 1218
Reputation: 17817
If you're in php you can use your regex:
/<[hH][1-3][^>]*>(.*?)<\/[hH][1-3]>/
then pass the captured result through strip_tags() function to get rid of all the insanity inside.
If you are not on php you can pass the result through regexp replace that removes tags. Something like replace
/<\/?[^>]+?>/
with empty string.
Upvotes: 1
Reputation: 1300
If you only want to capture the ultimately nested text you could just drop all tags inside the header tag with:
/<([hH][1-3]).*>(.*?)<.*\/$1>/
Untested, but I think it should work.
Upvotes: -1
Reputation: 124365
/<(h[1-3])[^>]*>(?:.*?>)?([^<]+)(?:<.*?)?<\/\1>/i
It will not be too hard to make cases that break it hideously, since (as I'm sure people will tell you) parsing HTML is a job for an HTML parser, not a regex, but it works for your given case and various similar ones.
Upvotes: 3