taga
taga

Reputation: 3885

Find number form except for specific number with regex

I want to find number form except specific number. For example, I want to find these kind of numbers:

1.214,41
4,431.43
143,134.43
355.352,41
443,113,134.43
365.115.352,41

And I can do it with this regex:

(\d{1,3}(,|.)){1,4}

Except. number is 0.00 or 0,00. I know that I can exclude these numbers with:

^(0.00|0,00)

But I o not know how to combine both regexes.

My text looks like this. I have provided minimal example, text is much much longer and requested phrases are all over the place:

Total 341,431.43
Saldo 0.00
Saldo 0,00
Total 1,431.43
Total 0,00
Saldo 0.60
...

And my full regex looks like this:

(Saldo|Total)\s(\d{1,3}(,|.)){1,4}

With:

re.search(regex, text)

I want to get:

Total 341,431.43
Total 1,431.43
Saldo 0.60
...

But sometimes I get rows with 0.00 or 0,00.

Upvotes: 1

Views: 97

Answers (2)

The fourth bird
The fourth bird

Reputation: 163372

You might use

\b(?:Saldo|Total)\s(?!0[.,]00\b)\d{1,3}(?:,\d{3})*\.\d\d\b

The pattern matches:

  • \b A word boundary to prevent a partial match
  • (?:Saldo|Total)\s Match either Saldo or Total followed by a whitespace char
  • (?!0[.,]00\b) Negative lookahead, assert not 0.00 or 0,00 directly to the right
  • \d{1,3}(?:,\d{3})*\.\d\d Match 1-3 digits, optional repetitions of 3 digits and . and 2 digits
  • \b A word boundary

See a regex demo and a Python demo

import re

strings = [
    "Total 341,431.43",
    "Saldo 0.00",
    "Saldo 0,00",
    "Total 1,431.43",
    "Total 0,00",
    "Saldo 0.60"
]

pattern = r"\b(?:Saldo|Total)\s(?!0[.,]00\b)\d{1,3}(?:,\d{3})*\.\d\d\b"
for s in strings:
    m = re.search(pattern, s)
    if m:
        print(s)

Output

Total 341,431.43
Total 1,431.43
Saldo 0.60

Upvotes: 1

Mad Physicist
Mad Physicist

Reputation: 114350

You don't need regex for everything. If you're processing a bunch of independent lines, process them separately. In that case, you can apply as many tests as you need:

incl = re.compile(r'(Saldo|Total)\s(\d{1,3}(,|.)){1,4}')
excl = {'0.00', '0,00'}
for line in text.splitlines():
    if incl.fullmatch(line) and line not in excl:
        print(line)

Or you can build a list for later use:

result = [line for line in text.splitlines() if incl.fullmatch(line) and line not in excl]
print('\n'.join(result))

If you're getting your data from a file, it's better to replace for line in text.splitlines(): with

for line in file:
    line.rstrip('\n')

Upvotes: 0

Related Questions