Marcin Bobowski
Marcin Bobowski

Reputation: 1765

How to remove entire div with preg_replace

Ok, as it is WordPress problem and it sadly goes a little deeper, I need to remove each representation of parent div and its inside:

<div class="sometestclass">
   <img ....>
   <div>.....</div>
   any other html tags
</div><!-- END: .sometestclass -->

The only idea I have is to match everything that starts with:

<div class="sometestclass">

and ends with:

<!-- END: .sometestclass -->

with all that is between (I can tag the end of parent div anyway I want, this is just a sample). Anybody have an idea how to do it with:

<?php $content = preg_replace('?????','',$content); ?>

Upvotes: 0

Views: 9971

Answers (4)

rob_st
rob_st

Reputation: 414

For the UTF-8 issue, I found a hack at the PHP-manual

So my functions looks as follows:

function rem_fi_cat() {
/* This function removes images from _within_ the article.
 * If these images are enclosed in a "wp-caption" div-tag.
 * If the articles are post formatted as "image".
 * Only on home-page, front-page an in category/archive-pages.
 */
if ( (is_home() || is_front_page() || is_category()) && has_post_format( 'image' ) ) {
    $document = new DOMDocument();
    $content = get_the_content( '', true );
    if( '' != $content ) {
        /* incl. UTF-8 "hack" as described at 
         * http://www.php.net/manual/en/domdocument.loadhtml.php#95251
         */
        $document->loadHTML( '<?xml encoding="UTF-8">' . $content );
        foreach ($doc->childNodes as $item) {
            if ($item->nodeType == XML_PI_NODE) {
                $doc->removeChild($item); // remove hack
                $doc->encoding = 'UTF-8'; // insert proper
            }
        }
        $xpath = new DOMXPath( $document );
        $pDivs = $xpath->query(".//div[@class='wp-caption']");

        foreach ( $pDivs as $div ) {
            $div->parentNode->removeChild( $div );
        }

        echo preg_replace( "/.*<div class=\"entry-container\">(.*)<\/div>.*/s", "$1", $document->saveHTML() );

    }
}

}

Upvotes: 0

Blake
Blake

Reputation: 1741

<?php $content = preg_replace('/<div class="sometestclass">.*?<\/div><!-- END: .sometestclass -->/s','',$content); ?>

My RegEx is a bit rusty, but I think this should work. Do note that, as others have said, RegEx is not properly equipped to handle some of the complexities of HTML.

In addition, this pattern won't find embedded div elements with the class sometestclass. You would need recursion for that.

Upvotes: 6

Sampson
Sampson

Reputation: 268364

I wouldn't use a regular expression. Instead, I would use the DOMDocument class. Just find all of the div elements with that class, and remove them from their parent(s):

$html = "<p>Hello World</p>
         <div class='sometestclass'>
           <img src='foo.png'/>
           <div>Bar</div>
         </div>";

$dom = new DOMDocument;
$dom->loadHTML( $html );

$xpath = new DOMXPath( $dom );
$pDivs = $xpath->query(".//div[@class='sometestclass']");

foreach ( $pDivs as $div ) {
  $div->parentNode->removeChild( $div );
}

echo preg_replace( "/.*<body>(.*)<\/body>.*/s", "$1", $dom->saveHTML() );

Which results in:

<p>Hello World</p>

Upvotes: 9

Ryan
Ryan

Reputation: 640

How about just some CSS .sometestclass{display: none;} ?

Upvotes: 0

Related Questions