Reputation: 24385
So basically I have a large sting (few paragraphs long).
I need to remove all text from this string that is not surrounded by any HTML tags.
For example, this string:
<h1>This is the title</h1>This is a bit of text with no HTML around it<p>This is within a paragraph tag</p>
Should be converted to:
<h1>This is the title</h1><p>This is within a paragraph tag</p>
I believe this is best done with regex, although I am not very familiar with it's synax.
All help is greatly appreciated.
This is what I ended up using:
<?php
$string = '<h1>This is the title</h1>This is a bit of text with no HTML around it<p>This is within a paragraph tag</p>';
$pattern = '/(<\/[^>]+>)[^<]*(<[^>]+>)/';
$replacement = '$1$2';
echo preg_replace($pattern, $replacement, $string);
?>
Upvotes: 1
Views: 757
Reputation: 7948
you could use this regex (<\/[^>]+>)[^<]*(<[^>]+>)
and replace with $1$2
live demo
Upvotes: 3