Reputation: 187
I'm trying to parse sendmail logs. These come split - with the 'to' and 'from' on different lines. I want to match the 'to' in order to establish that the line we're looking at is a 'to' line then catch as many email addresses as are present. There are many requests for help similar to this, but none (that I've found and I promise I have been looking!) that quite fit the same scenario.
I have tried working from several solutions on Stack Overflow without success. The issue is that the 'to=' is not optional, it is a requirement. Is this possible the PCRE regex?
Regex thus far (that only matches the first email address):
to\=((\<)?(?P<to>.+?\@.+?)(\>)?\,)
Example line:
Jul 16 13:35:05 mailserver sendmail[30892]: xxxxxxxxxxxxxx: [email protected],[email protected],[email protected], delay=00:00:00, xdelay=00:00:00, mailer=smtp, pri=91785, relay=relay.example.derp [1.2.3.4], dsn=2.0.0, stat=Sent (<[email protected]> Queued mail for delivery)
Ideally the matching after the 'to\=' would then match as many email address as are present, not just the first. If there is an answer to this out there that would work that I have missed/been unable to bend to my scenario - apologies.
Upvotes: 1
Views: 43
Reputation: 187
This is what I ended up using:
,\s*delay=.+|(?<=to=|,),?(<)?(?<to>[^@,=]+@[^<>\,]+)
It won't be perfect, but it works for me.
Upvotes: 0
Reputation: 163352
You could make use of the \G
anchor to get iterative matches asserting the position at the end of the previous match and capture the email address in a capturing group.
(?:to=|\G(?!^))([^,\s@]+@[^@,\s]+),
Explanation
(?:
Non capturing group
to=
match literally|
Or\G(?!^)
Assert position at the end of previous match, not at the start)
Close non capturing group(
Capture group 1
[^,\s@]+@[^@,\s]+
Negated character class, match any char other than a comma, @ or whitespace with matching an @
inbetween),
Close group 1 and match commaUpvotes: 1