luison
luison

Reputation: 2100

Regexp if contains XXX but not contains XXX

Trying to archive a regexp to filter bounced emails differing them from SPAM or temporary undeliverable.

Our idea is to grab certain words the expression could contain (code + word) but ignore the whole line if it contans others such as (SPAM|temporarily undeliverable|disk quota exceeded) etc, as this would not be considered permanent bounces. We've managed the first part and found a couple of answers here about negative regexp (http://stackoverflow.com/questions/1153856/string-negation-using-regular-expressions) but been completely unsuccessful in mixing both in one group sentence so far.

Something like:

.*(5.3.0|5.1.0).*(User unknown|invalid|Unknown address|doesn't have a)

but not match if anywhere else on the same line contains xxx words. Something like:

^(?!(SPAM|temporarily undeliverable|disk quota exceeded)).*$

So the following first line would match but the second should not

Diagnostic-Code: smtp; 5.3.0 - Other mail system problem 554-"delivery error: dd This user doesn't have a btinternet.com account ([email protected]) [0] - mta1000.bt.mail.ird.yahoo.com" (delivery attempts: 0)

Diagnostic-Code: smtp; 5.1.0 - Unknown address error 550-'RCPT TO: Mailbox disk quota exceeded' (delivery attempts: 0)

Upvotes: 0

Views: 454

Answers (1)

stema
stema

Reputation: 93026

You are searching only at the start of the string for your negation. You just need to add a .*

try

^(?!.*(SPAM|temporarily undeliverable|disk quota exceeded)).*(5.3.0|5.1.0).*(User unknown|invalid|Unknown address|doesn't have a)

See it here on Regexr

Upvotes: 2

Related Questions