D B
D B

Reputation: 532

preg_replace - keep what was has been replaced in variable

I was wondering if there is an elegant way to perform a preg_replace, but still keep what has been replaced by the preg_replace.

As example imagine an string containing the HTML of a random site. I want to remove the <head> from that string for further processing and still keep the content of the <head> element in an extra variable (e.g. for parsing meta flags).

I can think of two possibilities to do that (without using global variables):

if (preg_match('%<head>(.*?)</head>%ism', $html, $matches)) {
    $html = preg_replace('%<head>(.*?)</head>%ism', '', $html);
    $head = $matches[1];
}

This one has to perform the regex twice which is not ideal.

$head = '';
$html = preg_replace_callback(
        "%<head>(.*?)</head>%ism",
        function ($match) use (&$head) {
            $head .= $match[1];
            return '';
        },
        $html
);

I was wondering if there is a more elegant/efficient way to do that.

Upvotes: 0

Views: 132

Answers (1)

Niet the Dark Absol
Niet the Dark Absol

Reputation: 324620

You are trying to do to things: retrieve the head content, and remove the head content. Trying to merge two (similar, but) distinct things into one is only going to cause frustration.

Personally, I would go with the first of your two proposed options, but put the regex into a variable and re-use that instead of typing out the regex twice. Makes it easier to change later.

But then again, have you considered using a parser?

$dom = new DOMDocument();
$dom->loadHTML($html_source_here);
$headelement = $dom->getElementsByTagName('head')[0];
$headhtml = $dom->saveHTML($headelement);
$headelement->parentNode->removeChild($headelement);
$result = $dom->saveHTML();

Now you have both $headelement (which will include the <head>...</head> wrapper, complete with any attributes that may be on it), and the HTML with the <head> removed.

Upvotes: 2

Related Questions