Keyur Padalia
Keyur Padalia

Reputation: 2097

How do I get data from two HTML Divs using preg_match?

I am trying to get data between this two div:

--<div id="p_tab4" class="p_desc" style="display: block;">
--<div id="p_top_cats" class="p_top_cats">

I am using the below regex, but it's not getting me anything:

/<div id=\"p_tab4\" class=\"p_desc\" style=\"display: block;\">(.*?)<div id=\"p_top_cats\" class=\"p_top_cats\">/

How can I correct this regex?

Upvotes: 0

Views: 605

Answers (2)

Ja͢ck
Ja͢ck

Reputation: 173662

Assuming your HTML is somewhat well formed, like below:

<div id="p_tab4" class="p_desc" style="display: block;">...</div>
some stuff in between
<div id="p_top_cats" class="p_top_cats">
</div>

You can make use of DOMDocument and XPath:

$html = <<<'EOS'
<div id="p_tab4" class="p_desc" style="display: block;">
some stuff in between
<div id="p_top_cats" class="p_top_cats">
</div>
</div>
EOS;

$doc = new DOMDocument;
$doc->loadHTML($html);
$xpath = new DOMXPath($doc);

$query = '//node()[preceding-sibling::div[@id="p_tab4"] and following-sibling::div[@id="p_top_cats"]]';

foreach ($xpath->query($query) as $node) {
    echo $node->textContent, PHP_EOL;
}

Upvotes: 2

Avinash Raj
Avinash Raj

Reputation: 174874

Seems nothing wrong with your regex but you need to turn on the DOTALL mode s, so that the dot in your regex will also matches the newline character (line breaks).

~<div id=\"p_tab4\" class=\"p_desc\" style=\"display: block;\">(.*?)<div id=\"p_top_cats\" class=\"p_top_cats\">~s

Code:

$re = '~<div id=\"p_tab4\" class=\"p_desc\" style=\"display: block;\">(.*?)<div id=\"p_top_cats\" class=\"p_top_cats\">~s';
$str = "--<div id=\"p_tab4\" class=\"p_desc\" style=\"display: block;\">\n--<div id=\"p_top_cats\" class=\"p_top_cats\">";
preg_match($re, $str, $matches);
echo $matches[1];

DEMO

Upvotes: 1

Related Questions