Reputation: 1765
Ok, as it is WordPress problem and it sadly goes a little deeper, I need to remove each representation of parent div and its inside:
<div class="sometestclass">
<img ....>
<div>.....</div>
any other html tags
</div><!-- END: .sometestclass -->
The only idea I have is to match everything that starts with:
<div class="sometestclass">
and ends with:
<!-- END: .sometestclass -->
with all that is between (I can tag the end of parent div anyway I want, this is just a sample). Anybody have an idea how to do it with:
<?php $content = preg_replace('?????','',$content); ?>
Upvotes: 0
Views: 9971
Reputation: 414
For the UTF-8 issue, I found a hack at the PHP-manual
So my functions looks as follows:
function rem_fi_cat() {
/* This function removes images from _within_ the article.
* If these images are enclosed in a "wp-caption" div-tag.
* If the articles are post formatted as "image".
* Only on home-page, front-page an in category/archive-pages.
*/
if ( (is_home() || is_front_page() || is_category()) && has_post_format( 'image' ) ) {
$document = new DOMDocument();
$content = get_the_content( '', true );
if( '' != $content ) {
/* incl. UTF-8 "hack" as described at
* http://www.php.net/manual/en/domdocument.loadhtml.php#95251
*/
$document->loadHTML( '<?xml encoding="UTF-8">' . $content );
foreach ($doc->childNodes as $item) {
if ($item->nodeType == XML_PI_NODE) {
$doc->removeChild($item); // remove hack
$doc->encoding = 'UTF-8'; // insert proper
}
}
$xpath = new DOMXPath( $document );
$pDivs = $xpath->query(".//div[@class='wp-caption']");
foreach ( $pDivs as $div ) {
$div->parentNode->removeChild( $div );
}
echo preg_replace( "/.*<div class=\"entry-container\">(.*)<\/div>.*/s", "$1", $document->saveHTML() );
}
}
}
Upvotes: 0
Reputation: 1741
<?php $content = preg_replace('/<div class="sometestclass">.*?<\/div><!-- END: .sometestclass -->/s','',$content); ?>
My RegEx is a bit rusty, but I think this should work. Do note that, as others have said, RegEx is not properly equipped to handle some of the complexities of HTML.
In addition, this pattern won't find embedded div
elements with the class sometestclass
. You would need recursion for that.
Upvotes: 6
Reputation: 268364
I wouldn't use a regular expression. Instead, I would use the DOMDocument class. Just find all of the div
elements with that class, and remove them from their parent(s):
$html = "<p>Hello World</p>
<div class='sometestclass'>
<img src='foo.png'/>
<div>Bar</div>
</div>";
$dom = new DOMDocument;
$dom->loadHTML( $html );
$xpath = new DOMXPath( $dom );
$pDivs = $xpath->query(".//div[@class='sometestclass']");
foreach ( $pDivs as $div ) {
$div->parentNode->removeChild( $div );
}
echo preg_replace( "/.*<body>(.*)<\/body>.*/s", "$1", $dom->saveHTML() );
Which results in:
<p>Hello World</p>
Upvotes: 9