Reputation: 21
I know this topic has been covered to some degree, but after a few days I'm still having trouble figuring out the best way to parse a price from a block of text.
Here's are some examples:
This car is $15k and has $200 in upgrades
and
Those belts are USD 500.00 and I'm asking 50 for shipping
My approach has been to do three separate Regex matches:
preg_match_all('/^[0-9,]+(\.[0-9]{2})?(k)+$/',
strtolower($description), $price_array1);
preg_match_all('/^(\$|\$ |price|price |price is |price:|price: |us|us |usd|usd |asking|asking |wanting|wanting |want|want |sgd|euro|euro |£|£ |€|€ |gbp|gbp |cdn|cdn |)+[0-9,]+(\.[0-9]{2})?$/', strtolower($description), $price_array2);
preg_match_all('/(\$[0-9,]+(\.[0-9]{2})?)( eur|eur| firm| obo| shipped| \$|\$| €|€| £|£| gbp|gbp| dollar| aud)+/', strtolower($description), $price_array3);
But actually none of these seem to be working. I think I my regex is correct, so I'm not sure why they're not matching anything.
I will admit I'm a bit confused about whether I should use ^
and $
but I've tried it with and without and it doesn't seem to make a difference.
Upvotes: 2
Views: 1430
Reputation: 56829
This is my solution to strictly match money-like numbers (it will not notice any prefix or suffix, even k for thousand):
/(?<![0-9.,])(?:[0-9]{1,3}(?:,?[0-9]{3})*(?:\.[0-9]*)?|[0-9]{1,3}(?:\.?[0-9]{3})*(?:,[0-9]*)?)(?![0-9.,])/
It will accept 34563745,34534
, 283947982.234
, 283.432
, 234424.
, 4234,4324
, 2.234.434,23442
, 3,234,234.234
, 324849000
. But it will reject .453985
, ..,.,.434.,.34
, 234,43.234
, 23467,4443.234
.
The following will match case-insensitive prefix and with k (for thousand), aside from plain numbers:
/(?<= |^)(?:(?i)(?:\$|USD) *)?(?:[0-9]{1,3}(?:,?[0-9]{3})*(?:\.[0-9]*)?|[0-9]{1,3}(?:\.?[0-9]{3})*(?:,[0-9]*)?)(?:(?i)k)?(?![0-9.,])/
If you want to add more prefix, you can change this part of the regex:
(?:\$|USD)
Just add more prefix, without leading or trailing space. The regex will try to match even if there are many spaces.
The following will only match number with suffix (with optional thousand indicator):
/(?<= |^)(?:[0-9]{1,3}(?:,?[0-9]{3})*(?:\.[0-9]*)?|[0-9]{1,3}(?:\.?[0-9]{3})*(?:,[0-9]*)?)(?:(?i)(?:k )? *(?:\$|USD))(?= |$)/
Same as above if you want to add more suffix.
Test input and to try it out:
Here's are some examples: This car is $15k and has $200 in upgrades Those belts are USD 500.00 and I'm asking 50 for shipping 345,345.45 495.344,424 ..,5435 878,543.455.345 345345435.545 234728394,34345 345, 453. 0.4355 .453 sdfsd usd 23423423K
Upvotes: 2