Reputation: 1493
I have the following variable $text
which fires out a load of HTML. Most of which is not useful to me for my purposes but some if it is.
<div class="feed-item-description">
<ul>
<li><strong>Impact:</strong> Low</li>
<li><strong>Severity:</strong> <span class="label label-info">Low</span></li>
</ul>
...
I'd like to get the impact
and the severity
rating out of this text. I don't need the label.
I have tried doing this:
$itemAttributes = explode (':' , $text);
$impact = $itemAttributes[3];
$severity = $itemAttributes[4];
This does indeed seem to give me the attributes I want, but it also seems to call the word afterwards. It also behaves strangely in that even if I trim it, I cannot get rid of the preceding space from my output.
It also seems to close a <div>
behind it, which I can't explain. I'm sure I'm about to get shouted down about using Regex for HTML, but I figured there must be a way to get something so simple out as it's the same words each time preceding the information I want.
If you want to see the actual output on a page you can see it here: https://dev.joomlalondon.co.uk/ you can see in the output I generate that it closes the <div class="feed-item-description">
but I don't tell it to do that anywhere, and the output I use is contained within an <li>
not a <div>
.
Upvotes: 0
Views: 103
Reputation: 147146
Because you should really use DOMDocument
to parse HTML, here's a solution using it:
$doc = new DOMDocument();
$doc->loadHTML($html);
$xpath = new DOMXPath($doc);
$feed_items = $xpath->query('//div[contains(@class, "feed-item-description")]');
foreach ($feed_items as $feed_item) {
$impact_node = $xpath->query('//li[contains(string(), "Impact:")]', $feed_item);
$impact = preg_replace('/Impact:\W*/', '', $impact_node->item(0)->textContent);
echo $impact . "\n";
$severity_node = $xpath->query('//li[contains(string(), "Severity:")]', $feed_item);
$severity = preg_replace('/Severity:\W*/u', '', $severity_node->item(0)->textContent);
echo $severity . "\n";
}
Output (for your sample HTML)
Low
Low
Upvotes: 0
Reputation: 27723
Maybe,
^\h*(Impact:)\s+(.*)|^\h+(Severity:)\s+(.*)
would simply return those desired values.
$re = '/^\h*(Impact:)\s+(.*)|^\h+(Severity:)\s+(.*)/m';
$str = 'Project: Joomla!
SubProject: CMS
Impact: Low
Severity: Low
Versions: 3.6.0 - 3.9.12
Exploit type: Path Disclosure
Reported Date: 2019-November-01
Fixed Date: 2019-November-05
CVE Number: CVE-2019-18674
Description
Missing access check in the phputf8 mapping files could lead to an path disclosure.
Affected Installs
Joomla! CMS versions 3.6.0 - 3.9.12';
preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
var_dump($matches);
array(2) {
[0]=>
array(3) {
[0]=>
string(15) " Impact: Low"
[1]=>
string(7) "Impact:"
[2]=>
string(3) "Low"
}
[1]=>
array(5) {
[0]=>
string(17) " Severity: Low"
[1]=>
string(0) ""
[2]=>
string(0) ""
[3]=>
string(9) "Severity:"
[4]=>
string(3) "Low"
}
}
If you wish to simplify/update/explore the expression, it's been explained on the top right panel of regex101.com. You can watch the matching steps or modify them in this debugger link, if you'd be interested. The debugger demonstrates that how a RegEx engine might step by step consume some sample input strings and would perform the matching process.
jex.im visualizes regular expressions:
Upvotes: 1