preg_replace - keep what was has been replaced in variable

Question

I was wondering if there is an elegant way to perform a preg_replace, but still keep what has been replaced by the preg_replace.

As example imagine an string containing the HTML of a random site. I want to remove the from that string for further processing and still keep the content of the element in an extra variable (e.g. for parsing meta flags).

I can think of two possibilities to do that (without using global variables):

if (preg_match('%(.*?)%ism', $html, $matches)) {
    $html = preg_replace('%(.*?)%ism', '', $html);
    $head = $matches[1];
}

This one has to perform the regex twice which is not ideal.

$head = '';
$html = preg_replace_callback(
        "%(.*?)%ism",
        function ($match) use (&$head) {
            $head .= $match[1];
            return '';
        },
        $html
);

I was wondering if there is a more elegant/efficient way to do that.

Niet the Dark Absol · Accepted Answer

You are trying to do to things: retrieve the head content, and remove the head content. Trying to merge two (similar, but) distinct things into one is only going to cause frustration.

Personally, I would go with the first of your two proposed options, but put the regex into a variable and re-use that instead of typing out the regex twice. Makes it easier to change later.

But then again, have you considered using a parser?

$dom = new DOMDocument();
$dom->loadHTML($html_source_here);
$headelement = $dom->getElementsByTagName('head')[0];
$headhtml = $dom->saveHTML($headelement);
$headelement->parentNode->removeChild($headelement);
$result = $dom->saveHTML();

Now you have both $headelement (which will include the ... wrapper, complete with any attributes that may be on it), and the HTML with the removed.

preg_replace - keep what was has been replaced in variable

Answers (1)

Related Questions