Reputation: 532
I was wondering if there is an elegant way to perform a preg_replace
, but still keep what has been replaced by the preg_replace
.
As example imagine an string containing the HTML of a random site. I want to remove the <head>
from that string for further processing and still keep the content of the <head>
element in an extra variable (e.g. for parsing meta flags).
I can think of two possibilities to do that (without using global variables):
if (preg_match('%<head>(.*?)</head>%ism', $html, $matches)) {
$html = preg_replace('%<head>(.*?)</head>%ism', '', $html);
$head = $matches[1];
}
This one has to perform the regex twice which is not ideal.
$head = '';
$html = preg_replace_callback(
"%<head>(.*?)</head>%ism",
function ($match) use (&$head) {
$head .= $match[1];
return '';
},
$html
);
I was wondering if there is a more elegant/efficient way to do that.
Upvotes: 0
Views: 132
Reputation: 324620
You are trying to do to things: retrieve the head content, and remove the head content. Trying to merge two (similar, but) distinct things into one is only going to cause frustration.
Personally, I would go with the first of your two proposed options, but put the regex into a variable and re-use that instead of typing out the regex twice. Makes it easier to change later.
But then again, have you considered using a parser?
$dom = new DOMDocument();
$dom->loadHTML($html_source_here);
$headelement = $dom->getElementsByTagName('head')[0];
$headhtml = $dom->saveHTML($headelement);
$headelement->parentNode->removeChild($headelement);
$result = $dom->saveHTML();
Now you have both $headelement
(which will include the <head>...</head>
wrapper, complete with any attributes that may be on it), and the HTML with the <head>
removed.
Upvotes: 2