Pr0no
Pr0no

Reputation: 4099

Complex regex to filter out numbers from string, but with exceptions

I need a fairly complex regex to accomplish the following:

> replace numbers in a string, i.e. 700, 12.43 by a label (format: {NUMBER:xx})
> ignore: when number is between {braces}, i.e. {7}, {7th}
> ignore: when any character is attached to number, i.e. G3, 7x, 1/2
> except: when
          > preceded by $, i.e. $840
          > followed by .!?:, i.e. 33! 45.65?  4...

Taken all together:

Buy 4 {5} G3 Mac computers for 80% at $600 or 2 for 1/2 price: 200... 
dollar. Twice - 2x - as cheap!

Desired output:

Buy {NUMBER:4} {5} G3 Mac computers for 80% at 
$ {NUMBER:600} or {NUMBER:2} for 1/2 price: 
{$NUMBER:200} dollar. Twice - 2x - as cheap!

I now have this:

preg_replace("/(?<!{)(?>[0-9]+(?:\.[0-9]+)?)(?!})/", " {NUMBER:$0} ", $string);

which outputs:

Buy {NUMBER:4} {5} G {NUMBER:3} Mac computers for {NUMBER:80} % at 
$ {NUMBER:600} or {NUMBER:2} for {NUMBER:1} / {NUMBER:2} price: 
{NUMBER:200} ... dollar. Twice - {NUMBER:2} x - as cheap!

In other words: ignoring exceptions aren't working yet, and I don't know how to properly implement it. Who does and can help me out?

Upvotes: 1

Views: 178

Answers (2)

Tim Pietzcker
Tim Pietzcker

Reputation: 336168

This works for your test cases and follows your rules, assuming that braces are correctly matched and unnested:

$result = preg_replace(
    '/(?<!\{)        # Assert no preceding {
    (?<![^\s$])      # Assert no preceding non-whitespace except $
    \b               # Match start of number
    (\d+(?:\.\d+)?+) # Match number (optional decimal part)
    \b               # Match end of number
    (?![^{}]*\})     # Assert that next brace is not a closing brace
    (?![^\s.!?,])    # Assert no following non-whitespace except .!?,
    /x', 
    '{NUMBER:\1}', $string);

Upvotes: 2

Eugen Rieck
Eugen Rieck

Reputation: 65274

$string="Buy 4 {5} G3 Mac computers for 80% at \$600 or 2 for 1/2 price: 200... \ndollar. Twice - 2x - as cheap!";
$pattern='/[\s|^|\$]([0-9]+(\.\s+)*)[\s|$|\.|\!|\?|\:|\,]/';
//$count=preg_match_all($pattern, $string, $matches);
//echo "$count\n";
//print_r($matches[1]);
echo preg_replace($pattern,"{NUMBER:\$1}",$string);

Upvotes: 1

Related Questions