Reputation: 3885
I want to find number form except specific number. For example, I want to find these kind of numbers:
1.214,41
4,431.43
143,134.43
355.352,41
443,113,134.43
365.115.352,41
And I can do it with this regex:
(\d{1,3}(,|.)){1,4}
Except. number is 0.00
or 0,00
. I know that I can exclude these numbers with:
^(0.00|0,00)
But I o not know how to combine both regexes.
My text looks like this. I have provided minimal example, text is much much longer and requested phrases are all over the place:
Total 341,431.43
Saldo 0.00
Saldo 0,00
Total 1,431.43
Total 0,00
Saldo 0.60
...
And my full regex looks like this:
(Saldo|Total)\s(\d{1,3}(,|.)){1,4}
With:
re.search(regex, text)
I want to get:
Total 341,431.43
Total 1,431.43
Saldo 0.60
...
But sometimes I get rows with 0.00 or 0,00.
Upvotes: 1
Views: 97
Reputation: 163372
You might use
\b(?:Saldo|Total)\s(?!0[.,]00\b)\d{1,3}(?:,\d{3})*\.\d\d\b
The pattern matches:
\b
A word boundary to prevent a partial match(?:Saldo|Total)\s
Match either Saldo or Total followed by a whitespace char(?!0[.,]00\b)
Negative lookahead, assert not 0.00
or 0,00
directly to the right\d{1,3}(?:,\d{3})*\.\d\d
Match 1-3 digits, optional repetitions of 3 digits and .
and 2 digits\b
A word boundarySee a regex demo and a Python demo
import re
strings = [
"Total 341,431.43",
"Saldo 0.00",
"Saldo 0,00",
"Total 1,431.43",
"Total 0,00",
"Saldo 0.60"
]
pattern = r"\b(?:Saldo|Total)\s(?!0[.,]00\b)\d{1,3}(?:,\d{3})*\.\d\d\b"
for s in strings:
m = re.search(pattern, s)
if m:
print(s)
Output
Total 341,431.43
Total 1,431.43
Saldo 0.60
Upvotes: 1
Reputation: 114350
You don't need regex for everything. If you're processing a bunch of independent lines, process them separately. In that case, you can apply as many tests as you need:
incl = re.compile(r'(Saldo|Total)\s(\d{1,3}(,|.)){1,4}')
excl = {'0.00', '0,00'}
for line in text.splitlines():
if incl.fullmatch(line) and line not in excl:
print(line)
Or you can build a list for later use:
result = [line for line in text.splitlines() if incl.fullmatch(line) and line not in excl]
print('\n'.join(result))
If you're getting your data from a file, it's better to replace for line in text.splitlines():
with
for line in file:
line.rstrip('\n')
Upvotes: 0