Andy
Andy

Reputation: 3021

Strip Tags and everything in between

How can i strip <h1>including this content</h1>

I know you can use strip tags to remove the tags, but i want everything in between gone as well.

Any help would be appreciated.

Upvotes: 16

Views: 17127

Answers (5)

wasatz
wasatz

Reputation: 4278

You could use an XSLT stylesheet and match all tags to themselves except for the h1 tag which would be matched to the empty string, and then apply it to your document. Might be a bit too heavy-weight for doing something as simple as this though.

Upvotes: 0

Kanak Vaghela
Kanak Vaghela

Reputation: 8358

You also use strip_tags to remove the tags and also everything in between..

$html contain your html or php from where you want to remove the tags.

strip_tags($html,"");

Try this i think this will work for you.

Upvotes: -3

Gumbo
Gumbo

Reputation: 655269

As you’re dealing with HTML, you should use an HTML parser to process it correctly. You can use PHP’s DOMDocument and query the elements with DOMXPath, e.g.:

$doc = new DOMDocument();
$doc->loadHTML($html);
$xpath = new DOMXPath($doc);
foreach ($xpath->query('//h1') as $node) {
    $node->parentNode->removeChild($node);
}
$html = $doc->saveHTML();

Upvotes: 26

maček
maček

Reputation: 77778

If you want to strip ALL tags and including content:

$yourString = 'Hello <div>Planet</div> Earth. This is some <span class="foo">sample</span> content!';
$regex = '/<[^>]*>[^<]*<[^>]*>/';
echo preg_replace($regex, '', $yourString);
#=> Hello  Earth. This is some  content!

HTML attributes can contain < or >. So, if your HTML gets too messy this method will not work and you'll need a DOM parser.


Regular Expression Explanation

NODE                     EXPLANATION
--------------------------------------------------------------------------------
  <                        '<'
--------------------------------------------------------------------------------
  [^>]*                    any character except: '>' (0 or more times
                           (matching the most amount possible))
--------------------------------------------------------------------------------
  >                        '>'
--------------------------------------------------------------------------------
  [^<]*                    any character except: '<' (0 or more times
                           (matching the most amount possible))
--------------------------------------------------------------------------------
  <                        '<'
--------------------------------------------------------------------------------
  [^>]*                    any character except: '>' (0 or more times
                           (matching the most amount possible))
--------------------------------------------------------------------------------
  >                        '>'

Upvotes: 9

Sarfraz
Sarfraz

Reputation: 382696

Try this:

preg_replace('/<h1[^>]*>([\s\S]*?)<\/h1[^>]*>/', '', '<h1>including this content</h1>');

Example:

echo preg_replace('/<h1[^>]*>([\s\S]*?)<\/h1[^>]*>/', '', 'Hello<h1>including this content</h1> There !!');

Output:

Hello There

Upvotes: 9

Related Questions