Reputation: 5668

Get number which occurs after its label text in HTML

I'm using PHP to parse an e-mail and want to get the number after a specific string.

For example, I would want to get the number 033 from a string that looks like:

 Account Number: 033 
 Account Information: Some text here

The content is actually HTML, so the input string is more accurately presented as:

<font face="Arial, Helvetica, sans-serif" color="#000099"><strong><font color="#660000">Account  Number</font></strong><font color="#660000">: 033<br><strong>Account Name</strong>: More text here<br>

There is always the word Account Number: and then the number and then a line break. I have:

 preg_match_all('!\d+!', $str, $matches);

But that just gets all the numbers.

Upvotes: 1

Answers (4)

Josh

Reputation: 8191

If the number is always after Account Number: (including that space at the end), then just add that to your regex:

preg_match_all('/Account Number: (\d+)/',$str,$matches);
// The parentheses capture the digits and stores them in $matches[1]

Results:

$matches Array:
(
    [0] => Array
        (
            [0] => Account Number: 033
        )

    [1] => Array
        (
            [0] => 033
        )

)

Note: If there is HTML present, then that can be included in the regex as well as long as you don't believe the HTML is subject to change. Otherwise, I suggest using an HTML DOM Parser to get to the plain-text version of your string and using a regex from there.

With that said, the following is an example that includes the HTML in the regex and provides the same output as above:

// Notice the delimiter 
preg_match_all('@<font face="Arial, Helvetica, sans-serif" color="#000099"><strong><font color="#660000">Account 
Number</font></strong><font color="#660000">: (\d+)@',$str,$matches);

Upvotes: 11

mickmackusa

Reputation: 47874

@montes is appropriately calling strip_tags() to sanitize/simplify the input text before using regex to extract the targeted substring. However, the pattern could use some refinement and assuming there is only one Account Number per email, you shouldn't be using preg_match_all(), but preg_match().

No case-insensitivity is necessary, so there is no significance to the i pattern modifier.
There is no ^ or $ metacharacters in the pattern, so the m pattern modifier is useless.
There are no . metacharacters in the pattern, so the s pattern modifier is useless.
\K restarts the fullstring match. This is beneficial because it removes the necessity to use a capture group.

Code: (Demo)

$html = '<font face="Arial, Helvetica, sans-serif" color="#000099"><strong><font
    color="#660000">Account Number</font></strong><font color="#660000">: 033<br>
    <strong>Account Name</strong>: More text here<br>';

echo preg_match('~Account Number:\s*\K\d+~', strip_tags($html), $match)
     ? $match[0]
     : 'No Account Number Found';

Output:

Upvotes: 0

montes

Reputation: 606

Taking the HTML as the base:

$str = '<font face="Arial, Helvetica, sans-serif" color="#000099"><strong><font
    color="#660000">Account Number</font></strong><font color="#660000">: 033<br>
    <strong>Account Name</strong>: More text here<br>';
preg_match_all('!Account Number:\s+(\d+)!ims', strip_tags($str), $matches);
var_dump($matches);

and we get:

array(2) {
    [0]=>
    array(1) {
        [0]=>
        string(19) "Account Number: 033"
    }
    [1]=>
    array(1) {
        [0]=>
        string(3) "033"
    }
}

Upvotes: 1

kittycat

Reputation: 15045

$str = 'Account Number: 033 
 Account Information: Some text here';

preg_match('/Account Number:\s*(\d+)/', $str, $matches);

echo $matches[1]; // 033

You don't need to use preg_match_all() also you did not put your match into a backreference by placing it within parentheses.

Upvotes: 3

Get number which occurs after its label text in HTML

Answers (4)

Related Questions