Reputation: 349
So I have the following HTML:
<td class="testing">
<strong><span><a href="whatever">test</a></span></strong>
<div class="body" id="id_1234">test</div>
</td>
<td class="testing">
<strong><span><a href="whatever2">test</a></span></strong>
<div class="body" id="id_5678">test</div>
</td>
<td class="testing2">
<strong><span><a href="whatever2">test2</a></span></strong>
<div class="body" id="id_9012">test</div>
</td>
And I have the following regex that tries to get both 1234 and 5678:
~class="testing">\s*?<strong>.*?<a href=".*?">test</a>.*?<div class="body" id="id_(.*)">~Us
However, this returns only 5678, and not both:
[1] => Array
(
[0] => 5678
)
How could I make it use the shortest overall match? I already use the ? modifier after every .*, as well as the U modifier at the end.
Thanks!
Upvotes: 0
Views: 80
Reputation: 164798
Using DOM and XPath
$html = <<<_HTML
<td class="testing">
<strong><span><a href="whatever">test</a></span></strong>
<div class="body" id="id_1234">test</div>
</td>
<td class="testing">
<strong><span><a href="whatever2">test</a></span></strong>
<div class="body" id="id_5678">test</div>
</td>
<td class="testing2">
<strong><span><a href="whatever2">test2</a></span></strong>
<div class="body" id="id_9012">test</div>
</td>
_HTML;
$doc = new DOMDocument;
$doc->loadHTML($html);
$xp = new DOMXpath($doc);
$divs = $xp->query('//td[@class="testing" and //a[normalize-space(text())="test"]]/div[@class="body" and starts-with(@id, "id_")]');
$ids = array();
foreach ($divs as $div) {
$ids[] = substr($div->getAttribute('id'), 3);
}
Example here - http://codepad.viper-7.com/GbKIj2
Upvotes: 2
Reputation: 89557
The reason why your pattern doesn't work is the misunderstanding of the U modifier.
The U doesn't make all the quantifier ungreedy (or lazy). The U modifier is a switch, and when you use it:
1) all the greedy quantifiers become ungreedy (or lazy)
2) all the ungreedy (or lazy) quantifiers become greedy.
Since you use the U modifier in your pattern, the .*?
is greedy.
Upvotes: 2
Reputation: 24645
This produces the results you are after:
<?php
$str = '<td class="testing">
<strong><span><a href="whatever">test</a></span></strong>
<div class="body" id="id_1234">test</div>
</td>
<td class="testing">
<strong><span><a href="whatever2">test2</a></span></strong>
<div class="body" id="id_5678">test</div>
</td>';
$matches = array();
preg_match_all('/id\="id_([0-9]+)\"/m', $str, $matches);
print_r($matches[1]);
Upvotes: 0
Reputation: 13535
You can use preg_match_all
preg_match_all("/id\=\"id_([0-9]+)\"/g", $html, $matches);
Upvotes: 0