Reputation: 8965
I have the following text
<h4>Section 1</h4>
<ul>
<li><a href="http://link">link text</a></li>
<li><a href="http://link">link text/a></li>
<li><a href="http://link">link text</a></li>
<li><a href="http://link">link text</a></li>
</ul>
<h4>Section 2</h4>
<ul>
<li><a href="http://link">link text</a></li>
</ul>
<h4>Section 3</h4>
<ul>
<li><a href="http://link">link text</a></li>
</ul>
This is the regex I have constructed so far
<h4>(.*?)</h4>
<ul>
(.*?)
</ul>
but it only matches the "Section 2" and "Section 3". How can I make it match all the sections including "Section 1"?
Upvotes: 1
Views: 366
Reputation: 44823
It depends on the language you are using (PHP, Perl, etc.), but it will be something like this:
(?s)<h4>(.*?)</h4>\s*<ul>(.*?)</ul>
The (?s)
lets .
match newline (\n
) characters.
For example, in PHP, you can do something like this:
// The regex
$regex = '#(?s)<h4>(.*?)</h4>\s*<ul>(.*?)</ul>#';
// Test data
$data = '<h4>Section 1</h4>
<ul>
<li><a href="http://link">link text</a></li>
<li><a href="http://link">link text/a></li>
<li><a href="http://link">link text</a></li>
<li><a href="http://link">link text</a></li>
</ul>
<h4>Section 2</h4>
<ul>
<li><a href="http://link">link text</a></li>
</ul>
<h4>Section 3</h4>
<ul>
<li><a href="http://link">link text</a></li>
</ul>';
// Get all matches
preg_match_all($regex, $data, $matches);
// Just to show the results
ob_start();
var_dump( $matches );
$show_html = ob_get_contents();
ob_end_clean();
echo "<pre>".htmlentities($show_html)."</pre>";
Upvotes: 3