Reputation: 2315
$str = 'some text tag contents more text ';
My questions are:
How to retrieve content tag <em>contents </em>
which is between <MY_TAG> .. </MY_TAG>
?
And
How to remove <MY_TAG>
and its contents from $str
?
I am using PHP.
Thank you.
Upvotes: 12
Views: 23338
Reputation: 1354
I tested this function, it works for nested tags too, use true/false to exclude/include your tags. Found here: https://www.php.net/manual/en/function.strip-tags.php
<?php
function strip_tags_content($text, $tags = '', $invert = FALSE) {
preg_match_all('/<(.+?)[\s]*\/?[\s]*>/si', trim($tags), $tags);
$tags = array_unique($tags[1]);
if(is_array($tags) AND count($tags) > 0) {
if($invert == FALSE) {
return preg_replace('@<(?!(?:'. implode('|', $tags) .')\b)(\w+)\b.*?>.*?</\1>@si', '', $text);
}
else {
return preg_replace('@<('. implode('|', $tags) .')\b.*?>.*?</\1>@si', '', $text);
}
}
elseif($invert == FALSE) {
return preg_replace('@<(\w+)\b.*?>.*?</\1>@si', '', $text);
}
return $text;
}
// Sample text:
$text = '<b>sample</b> text with <div>tags</div>';
// Result for:
echo strip_tags_content($text);
// text with
// Result for:
echo strip_tags_content($text, '<b>');
// <b>sample</b> text with
// Result for:
echo strip_tags_content($text, '<b>', TRUE);
// text with <div>tags</div>
Upvotes: 1
Reputation: 5107
For removal I ended up just using this:
$str = preg_replace('~<MY_TAG(.*?)</MY_TAG>~Usi', "", $str);
Using ~ instead of / for the delimiter solved errors being thrown because of the backslash in the end tag, which seemed to be an issue even with escaping. Eliminating > from the opening tag allows for attributes or other characters and still gets the tag and all of its contents.
This only works where nesting is not a concern.
The Usi
modifiers mean U = Ungreedy, s = include linebreak characters, i = case insensitive.
Upvotes: 14
Reputation: 41827
You do not want to use regular expressions for this. A much better solution would be to load your contents into a DOMDocument and work on it using the DOM tree and standard DOM methods:
$document = new DOMDocument();
$document->loadXML('<root/>');
$document->documentElement->appendChild(
$document->createFragment($myTextWithTags));
$MY_TAGs = $document->getElementsByTagName('MY_TAG');
foreach($MY_TAGs as $MY_TAG)
{
$xmlContent = $document->saveXML($MY_TAG);
/* work on $xmlContent here */
/* as a further example: */
$ems = $MY_TAG->getElementsByTagName('em');
foreach($ems as $em)
{
$emphazisedText = $em->nodeValue;
/* do your operations here */
}
}
Upvotes: 2
Reputation: 33197
Although the only fully correct way to do this is not to use regular expressions, you can get what you want if you accept it won't handle all special cases:
preg_match("/<em[^>]*?>.*?</em>/i", $str, $match);
// Use this only if you aren't worried about nested tags.
// It will handle tags with attributes
And
preg_replace(""/<MY_TAG[^>]*?>.*?</MY_TAG>/i", "", $str);
Upvotes: 1
Reputation: 655189
If MY_TAG
can not be nested, try this to get the matches:
preg_match_all('/<MY_TAG>(.*?)<\/MY_TAG>/s', $str, $matches)
And to remove them, use preg_replace
instead.
Upvotes: 13