user1420094
user1420094

Reputation: 21

Get all currency/price expressions from a block of text

I know this topic has been covered to some degree, but after a few days I'm still having trouble figuring out the best way to parse a price from a block of text.

Here's are some examples:

This car is $15k and has $200 in upgrades

and

Those belts are USD 500.00 and I'm asking 50 for shipping

My approach has been to do three separate Regex matches:

  1. To find the prices that are abbreviated with a K
  2. To find the prices with a prefix
  3. To find the prices with a suffix

look for dollars with thousands abbreviated

preg_match_all('/^[0-9,]+(\.[0-9]{2})?(k)+$/', 
                    strtolower($description), $price_array1);

look for dollars with prefixes

preg_match_all('/^(\$|\$ |price|price |price is |price:|price: |us|us |usd|usd |asking|asking |wanting|wanting |want|want |sgd|euro|euro |£|£ |€|€ |gbp|gbp |cdn|cdn |)+[0-9,]+(\.[0-9]{2})?$/', strtolower($description), $price_array2);

look for dollars with suffixes

preg_match_all('/(\$[0-9,]+(\.[0-9]{2})?)( eur|eur| firm| obo| shipped| \$|\$| €|€| £|£| gbp|gbp| dollar| aud)+/', strtolower($description), $price_array3);

But actually none of these seem to be working. I think I my regex is correct, so I'm not sure why they're not matching anything.

I will admit I'm a bit confused about whether I should use ^ and $ but I've tried it with and without and it doesn't seem to make a difference.

Upvotes: 2

Views: 1430

Answers (1)

nhahtdh
nhahtdh

Reputation: 56829

This is my solution to strictly match money-like numbers (it will not notice any prefix or suffix, even k for thousand):

/(?<![0-9.,])(?:[0-9]{1,3}(?:,?[0-9]{3})*(?:\.[0-9]*)?|[0-9]{1,3}(?:\.?[0-9]{3})*(?:,[0-9]*)?)(?![0-9.,])/

It will accept 34563745,34534, 283947982.234, 283.432, 234424., 4234,4324, 2.234.434,23442, 3,234,234.234, 324849000. But it will reject .453985, ..,.,.434.,.34, 234,43.234, 23467,4443.234.

The following will match case-insensitive prefix and with k (for thousand), aside from plain numbers:

/(?<= |^)(?:(?i)(?:\$|USD) *)?(?:[0-9]{1,3}(?:,?[0-9]{3})*(?:\.[0-9]*)?|[0-9]{1,3}(?:\.?[0-9]{3})*(?:,[0-9]*)?)(?:(?i)k)?(?![0-9.,])/

If you want to add more prefix, you can change this part of the regex:

(?:\$|USD)

Just add more prefix, without leading or trailing space. The regex will try to match even if there are many spaces.

The following will only match number with suffix (with optional thousand indicator):

/(?<= |^)(?:[0-9]{1,3}(?:,?[0-9]{3})*(?:\.[0-9]*)?|[0-9]{1,3}(?:\.?[0-9]{3})*(?:,[0-9]*)?)(?:(?i)(?:k )? *(?:\$|USD))(?= |$)/

Same as above if you want to add more suffix.

Test input and to try it out:

Here's are some examples: This car is $15k and has $200 in upgrades Those belts are USD 500.00 and I'm asking 50 for shipping 345,345.45 495.344,424 ..,5435 878,543.455.345 345345435.545 234728394,34345 345, 453. 0.4355 .453 sdfsd usd 23423423K

Upvotes: 2

Related Questions