Reputation: 2815
I've been dwelling on this for a while.
I have this string (there are more contents before and after the h2 tags):
...<h2 style='line-height: 44px;'><p>Lorem Ipsum</p></h2>...
What regex do I use to remove all the <p> and </p> tags inside those header tags?
I'm trying to do something like this, but the positive lookbehind one is not working:
// for the starting <p> tag
$str = preg_replace('/(?<=<h[1-6]{1}[^>]+>)\s*<p>/i', '', $str);
// for the ending </p> tag
$str = preg_replace('/<\/p>\s*(?=<\/h[1-6]{1}>\s*)/i', '', $str);
This does not take account paragraph tags deep inside the text within the <h2> tag also
[Update]
This is derived from one of PeeHaa's suggested links
// for the starting <p> tag
$str = preg_replace("#(<h[1-6].*?>)<p.*?>#", '$1', $str);
// for the ending </p> tag
$str = preg_replace("#<\/p>(<\/h[1-6]>)#", '$1', $str);
Upvotes: 1
Views: 3460
Reputation: 9567
You shouldn't try parse html with regexes, though having said that, since this is a subset of html and not a full document / nested layout, it is possible:
preg_replace('/(<h([1-6])[^>]*>)\s?<p>(.*)?<\/p>\s?(<\/h\2>)/', "$1$3$4")
Test case here:
Upvotes: 3
Reputation: 72681
And many many many others (I could have added 100+ more).
Basically the thing is:
Don't try to parse HTML using regex. HTML is not a regular language.
Use a HTML parser for this.
For example: http://php.net/manual/en/book.dom.php
Upvotes: 1