Cain Nuke
Cain Nuke

Reputation: 3083

using preg_match with html comments

I want to convert into a string the html contained between these comments

<!--content-start-->
 desired html
<!--content-end-->

so I use pregmatch, right?

preg_match("/<!--content-start-->(.*)<!--content-end-->/i", $rss, $content);

but it wont work. Maybe a problem with the REGEX?

Thank you.

Upvotes: 0

Views: 538

Answers (2)

miken32
miken32

Reputation: 42690

Something like this should work. The XPath query looks for a comment containing "content-start" and then returns the sibling nodes following it. We loop through until we find the closing comment.

$html = <<< HTML
<!--content-start-->
<p>Here is my <i>desired html</i></p>
<!-- a comment -->
<div class="foo">Here is more</div>
<!--content-end-->
<p>Not returning this</p>
HTML;
$return = "";
$dom = new DomDocument;
$dom->loadHTML($html, LIBXML_HTML_NODEFDTD | LIBXML_HTML_NOIMPLIED);
$xpath = new DomXpath($dom);
$siblings = $xpath->query("//comment()[.='content-start']/following-sibling::node()");
foreach ($siblings as $node) {
    if ($node instanceof DOMComment && $node->textContent === "content-end") {
        break;
    }
    $return .= $dom->saveHTML($node) . "\n";
}
echo $return;

Output:

<p>Here is my <i>desired html</i></p>
<!-- a comment -->
<div class="foo">Here is more</div>

Upvotes: 1

drmad
drmad

Reputation: 620

Perhaps a /s modifier will help. Check the documentation:

s (PCRE_DOTALL)

If this modifier is set, a dot metacharacter in the pattern matches all characters, including newlines. Without it, newlines are excluded. This modifier is equivalent to Perl's /s modifier. A negative class such as [^a] always matches a newline character, independent of the setting of this modifier.

Upvotes: 1

Related Questions