Reputation: 115
I'm having a problem with a regex I created. My company searches for errors in an error file and tries to match the file to a set of possible strings. If one of the strings is found, then it means that that specific error has occurred, and then we send an e-mail for the client telling them what errors have occurred and with a count for each specific set of strings.
So let's say I have the string
:
Number 123456789: Duplicate transaction detected
Number 543267890: is a duplicate for this vendor
Password error number 987654321
The total can not be negative
Then I would search for all the "Duplicate" errors, the "Password" errors and the "Negative" errors. Each type of error has a set of strings that might indicate that.
I'm running the following regex to get the Duplicate errors:
number_of_errors = re.subn(
r"(is a duplicate for this vendor|Duplicate transaction detected)", "", string,
)[1]
The number_of_errors
variable holds the amount of times the regex was found in the string.
It was working fine until the third-party software that does the error handling started to create the file differently.
Right now the file might look like:
Number 123456789: Duplicate transaction detected because it is a duplicate for this vendor
Number 543267890: is a duplicate for this vendor
Password error number 987654321
The total can not be negative
As you can see, right now the first line would be counted twice, because the regex matches both strings in the first line.
Is there any way to match only once per line in the regex?
Thanks in advance!
Upvotes: 4
Views: 1683
Reputation: 18641
Yes, use
r"(?m)^.*?(is a duplicate for this vendor|Duplicate transaction detected)"
See proof. The (?m)^.*?
part makes the pattern match at the start of each line since the caret matches the line start position and the .*?
matches any zero or more characters other than linebreaks, but as few as possible.
Upvotes: 3
Reputation: 106
If you check the parameters of re.subn(pattern, repl, string, count=0, flags=0)
you see that there is a parameter count, which you can set to 1.
Upvotes: 0
Reputation: 79338
You can use the parameter count within the re.subn
function. This is to indicate the maximum number of replacements to be done.
number_of_errors = re.subn(
r"(is a duplicate for this vendor|Duplicate transaction detected)", "", string, count = 1)[1]
Upvotes: 0