Reputation: 2100
Trying to archive a regexp to filter bounced emails differing them from SPAM or temporary undeliverable.
Our idea is to grab certain words the expression could contain (code + word) but ignore the whole line if it contans others such as (SPAM|temporarily undeliverable|disk quota exceeded) etc, as this would not be considered permanent bounces. We've managed the first part and found a couple of answers here about negative regexp (http://stackoverflow.com/questions/1153856/string-negation-using-regular-expressions) but been completely unsuccessful in mixing both in one group sentence so far.
Something like:
.*(5.3.0|5.1.0).*(User unknown|invalid|Unknown address|doesn't have a)
but not match if anywhere else on the same line contains xxx words. Something like:
^(?!(SPAM|temporarily undeliverable|disk quota exceeded)).*$
So the following first line would match but the second should not
Diagnostic-Code: smtp; 5.3.0 - Other mail system problem 554-"delivery error: dd This user doesn't have a btinternet.com account ([email protected]) [0] - mta1000.bt.mail.ird.yahoo.com" (delivery attempts: 0)
Diagnostic-Code: smtp; 5.1.0 - Unknown address error 550-'RCPT TO: Mailbox disk quota exceeded' (delivery attempts: 0)
Upvotes: 0
Views: 454
Reputation: 93026
You are searching only at the start of the string for your negation. You just need to add a .*
try
^(?!.*(SPAM|temporarily undeliverable|disk quota exceeded)).*(5.3.0|5.1.0).*(User unknown|invalid|Unknown address|doesn't have a)
See it here on Regexr
Upvotes: 2