Reputation: 418
I'm trying to extract the words within the <li>
</li>
tags below. My regex is working well, but only giving me the first <li>
, Lorem ipsum...
I'm reasonably new to regex, and I am aware it would be likely more reliable to do this by traversing the DOM, but in this case regex is prefered. Any ideas what I need to change to get all the results, instead of just the one?
/<div class="foo-bar">[\s\S]+<ul>[\s\S]*?(<li>([\s\S]*?)<\/li>)+[\s\S]*?<\/ul>/
<div class="foo-bar">
<!-- Other junk -->
<ul>
<li>
Lorem ipsum dolor sit amet, consectetur adipiscing elit.
</li>
<li>
Vestibulum iaculis nibh ac orci imperdiet ultrices.
</li>
<li>
Fusce neque lacus, feugiat eget sapien eget, ullamcorper rutrum mauris.
</li>
<li>
Maecenas in ipsum consectetur, finibus ex et, condimentum turpis.
</li>
</ul>
<!-- Other junk -->
</div>
Upvotes: 1
Views: 60
Reputation: 19502
Use DOM+Xpath not RegEx.
$document = new DOMDocument();
$document->loadHTML($html);
$xpath = new DOMXpath($document);
foreach($xpath->evaluate('//div[@class="foo-bar"]/ul/li') as $li) {
var_dump($li->textContent);
}
Output:
string(80) "
Lorem ipsum dolor sit amet, consectetur adipiscing elit.
"
string(75) "
Vestibulum iaculis nibh ac orci imperdiet ultrices.
"
string(95) "
Fusce neque lacus, feugiat eget sapien eget, ullamcorper rutrum mauris.
"
string(89) "
Maecenas in ipsum consectetur, finibus ex et, condimentum turpis.
"
Upvotes: 1
Reputation: 815
It'll be better to use the following with preg_match_all()
. I just tested it here and it's working.
First preg_match_all
the following to get only the content of the `
/<div class="foo-bar">([\s\S]*?)+<ul>([\s\S]*?)<\/ul>([\s\S]*?)<\/div>/
Then preg_match_all
the result of the previous preg_match_all
with the following to only get the <li>
contents
/<li>([\s\S]*?)<\/li>/
Upvotes: 0
Reputation: 698
Add the global g
flag at the end. For example:
/<div class="foo-bar">[\s\S]+<ul>[\s\S]*?(<li>([\s\S]*?)<\/li>)+[\s\S]*?<\/ul>/g
You may also want the i
flag for case-insensitive
Upvotes: 0