PHP How to remove certain attributes from a body of text

Question

I have the following variable $text which fires out a load of HTML. Most of which is not useful to me for my purposes but some if it is.

HTML that comes out:



Impact: Low
Severity: Low

...

What I'd like to do

I'd like to get the impact and the severity rating out of this text. I don't need the label.

I have tried doing this:

$itemAttributes = explode (':' , $text);

$impact     = $itemAttributes[3];
$severity   = $itemAttributes[4];

This does indeed seem to give me the attributes I want, but it also seems to call the word afterwards. It also behaves strangely in that even if I trim it, I cannot get rid of the preceding space from my output.

It also seems to close a

behind it, which I can't explain. I'm sure I'm about to get shouted down about using Regex for HTML, but I figured there must be a way to get something so simple out as it's the same words each time preceding the information I want.

If you want to see the actual output on a page you can see it here: https://dev.joomlalondon.co.uk/ you can see in the output I generate that it closes the

but I don't tell it to do that anywhere, and the output I use is contained within an

not a

.

Emma · Accepted Answer

Maybe,

^\h*(Impact:)\s+(.*)|^\h+(Severity:)\s+(.*)

would simply return those desired values.

Test

$re = '/^\h*(Impact:)\s+(.*)|^\h+(Severity:)\s+(.*)/m';
$str = 'Project: Joomla!
    SubProject: CMS
    Impact: Low
    Severity: Low
    Versions: 3.6.0 - 3.9.12
    Exploit type: Path Disclosure
    Reported Date: 2019-November-01
    Fixed Date: 2019-November-05
    CVE Number: CVE-2019-18674

Description

Missing access check in the phputf8 mapping files could lead to an path disclosure.
Affected Installs

Joomla! CMS versions 3.6.0 - 3.9.12';

preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);

var_dump($matches);

Output

array(2) {
  [0]=>
  array(3) {
    [0]=>
    string(15) "    Impact: Low"
    [1]=>
    string(7) "Impact:"
    [2]=>
    string(3) "Low"
  }
  [1]=>
  array(5) {
    [0]=>
    string(17) "    Severity: Low"
    [1]=>
    string(0) ""
    [2]=>
    string(0) ""
    [3]=>
    string(9) "Severity:"
    [4]=>
    string(3) "Low"
  }
}

If you wish to simplify/update/explore the expression, it's been explained on the top right panel of regex101.com. You can watch the matching steps or modify them in this debugger link, if you'd be interested. The debugger demonstrates that how a RegEx engine might step by step consume some sample input strings and would perform the matching process.

RegEx Circuit

jex.im visualizes regular expressions: