Reputation: 1730
So, still learning, regex is mind numbing stuff. But I have a working regex to preg_match in php any numbers based around product pricing that follow a currency symbol £. This may be helpful as I couldn't find a working example to consider all variants (such as thousand , and decimals etc). Any improvements to the regex totally welcome!
My question is why though does the array contain 3 instances of every number? And what's the meaning of the "2" that follows?
(?<=\£|GBP)((\d{1,6}(,\d{3})*)|(\d+))(\.\d{2})?
Function:
function website($url) {
$xml = new DOMDocument();
if(@$xml->loadHTMLFile($url)) {
$xpath = new DOMXPath( $xml );
$textNodes = $xpath->query( '//text()' );
foreach ( $textNodes as $textNode ) {
if ( preg_match('/(?<=\£|GBP)((\d{1,6}(,\d{3})*)|(\d+))(\.\d{2})?/', $textNode->nodeValue, $matches, PREG_OFFSET_CAPTURE ) ) {
$website_prices[] = $matches;
global $website_prices;
}
}
}
print_r is dumping:
[3] => Array
(
[0] => Array
(
[0] => 545
[1] => 2
)
[1] => Array
(
[0] => 545
[1] => 2
)
[2] => Array
(
[0] => 545
[1] => 2
)
)
Upvotes: 1
Views: 574
Reputation: 4523
Your current regex has lots of unnecessary grouping / formatting, which isn't needed. The following regex would be suitable in your case :
(?<=£|GBP)[\d.,]+
PHP
(implementation)
<?php
$re = '/(?<=£|GBP)[\d.,]+/';
$str = '£545 £5450 £54.20 £5450 £545,620 £545,620.96
GBP545 GBP5450 GBP54.20 GBP5450 GBP545,620 GBP545,620.96';
preg_match_all($re, $str, $matches);
print_r($matches);
?>
(output)
Array
(
[0] => Array
(
[0] => 545
[1] => 5450
[2] => 54.20[3] => 5450
[4] => 545,620
[5] => 545,620.96
[6] => 545
[7] => 5450
[8] => 54.20
[9] => 5450
[10] => 545,620
[11] => 545,620.96
)
)
Upvotes: 1