pmelon
pmelon

Reputation: 187

Repeated regex/PCRE named capturing groups with a match in front

I'm trying to parse sendmail logs. These come split - with the 'to' and 'from' on different lines. I want to match the 'to' in order to establish that the line we're looking at is a 'to' line then catch as many email addresses as are present. There are many requests for help similar to this, but none (that I've found and I promise I have been looking!) that quite fit the same scenario.

I have tried working from several solutions on Stack Overflow without success. The issue is that the 'to=' is not optional, it is a requirement. Is this possible the PCRE regex?

Regex thus far (that only matches the first email address):

to\=((\<)?(?P<to>.+?\@.+?)(\>)?\,)

Example line:

Jul 16 13:35:05 mailserver sendmail[30892]: xxxxxxxxxxxxxx: [email protected],[email protected],[email protected], delay=00:00:00, xdelay=00:00:00, mailer=smtp, pri=91785, relay=relay.example.derp [1.2.3.4], dsn=2.0.0, stat=Sent (<[email protected]> Queued mail for delivery)

Ideally the matching after the 'to\=' would then match as many email address as are present, not just the first. If there is an answer to this out there that would work that I have missed/been unable to bend to my scenario - apologies.

Upvotes: 1

Views: 43

Answers (2)

pmelon
pmelon

Reputation: 187

This is what I ended up using:

,\s*delay=.+|(?<=to=|,),?(<)?(?<to>[^@,=]+@[^<>\,]+) 

It won't be perfect, but it works for me.

Upvotes: 0

The fourth bird
The fourth bird

Reputation: 163352

You could make use of the \G anchor to get iterative matches asserting the position at the end of the previous match and capture the email address in a capturing group.

(?:to=|\G(?!^))([^,\s@]+@[^@,\s]+),

Explanation

  • (?: Non capturing group
    • to= match literally
    • | Or
    • \G(?!^) Assert position at the end of previous match, not at the start
  • ) Close non capturing group
  • ( Capture group 1
    • [^,\s@]+@[^@,\s]+ Negated character class, match any char other than a comma, @ or whitespace with matching an @ inbetween
  • ), Close group 1 and match comma

Regex demo

Upvotes: 1

Related Questions