tleif
tleif

Reputation: 95

Perl RegEx non-capturing group with alternative capturing within the group

I'm trying to parse out some mail logs that have the three following possible formats for the relay.

Oct 24 03:49:10 mxout/mxout/1.1.1.1 sendmail[4642]: x9NA4Wbp011336: to=<[email protected]>, delay=1+00:44:37, xdelay=00:00:00, mailer=esmtp, pri=459, relay=mail-company.com. [0.0.0.0], tls=no, dsn=4.0.0, stat=Deferred: Connection reset by mail-company.com
Oct 24 03:49:10 mxout/mxout/1.1.1.1 sendmail[4642]: x9NA4Wbp011336: to=<[email protected]>, delay=1+00:44:37, xdelay=00:00:00, mailer=esmtp, pri=459, relay=[0.0.0.0], tls=no, dsn=4.0.0, stat=Deferred: Connection reset by mail-company.com
Oct 24 03:49:10 mxout/mxout/1.1.1.1 sendmail[4642]: x9NA4Wbp011336: to=<[email protected]>, delay=1+00:44:37, xdelay=00:00:00, mailer=esmtp, pri=459, relay=mail-company.com., tls=no, dsn=4.0.0, stat=Deferred: Connection reset by mail-company.com

With the this code:

my $topat    = '^(\w{3})\s{1,2}(\d{1,2}) (\d{2}:\d{2}:\d{2}).+ sendmail\[\d.+\]: (\w+): to=<(\S+)>(?:,|, \[more\],) delay.+, relay=(?:(?:\S+ )?\[(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\]|(\S+)\.), .+, stat=(.+)';

foreach my $line(@i) {
  if($line =~ /$topat/){
    my ($month, $day, $time, $id, $addy, $relay, $stat) = ($line =~ m/$topat/);
     print $line;
     print "$addy $relay $stat\n";
  }
}

I get the following errors:

Oct 24 03:49:10 mxout/mxout/1.1.1.1 sendmail[4642]: x9NA4Wbp011336: to=<[email protected]>, delay=1+00:44:37, xdelay=00:00:00, mailer=esmtp, pri=459, relay=mail-company.com. [0.0.0.0], tls=no, dsn=4.0.0, stat=Deferred: Connection reset by mail-company.com
Use of uninitialized value $stat in concatenation (.) or string at ./reg_test line 26.
[email protected] 0.0.0.0 

Oct 24 03:49:10 mxout/mxout/1.1.1.1 sendmail[4642]: x9NA4Wbp011336: to=<[email protected]>, delay=1+00:44:37, xdelay=00:00:00, mailer=esmtp, pri=459, relay=[0.0.0.0], tls=no, dsn=4.0.0, stat=Deferred: Connection reset by mail-company.com
Use of uninitialized value $stat in concatenation (.) or string at ./reg_test line 26.
[email protected] 0.0.0.0 

Oct 24 03:49:10 mxout/mxout/1.1.1.1 sendmail[4642]: x9NA4Wbp011336: to=<[email protected]>, delay=1+00:44:37, xdelay=00:00:00, mailer=esmtp, pri=459, relay=mail-company.com., tls=no, dsn=4.0.0, stat=Deferred: Connection reset by mail-company.com
Use of uninitialized value $relay in concatenation (.) or string at ./reg_test line 26.
[email protected]  mail-company.com

In the first two cases it properly grabs the address and relay but not the stat. And in the third it grabs the address and relay but it thinks that $relay is blank and the $stat is the relay.

I've tried a number of different configurations and groups and I can't seem to find the right solution. any pointers would be much appreciated.

Upvotes: 4

Views: 144

Answers (1)

choroba
choroba

Reputation: 241868

You have two alternatives in the relay field:

relay=(?:(?:\S+ )?\[(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\]|(\S+)\.)
                    ^    ----      $6         ----     ^  | ^$7^ 

If it doesn't follow the first pattern but matches the second one, the relay ends up in $7 and $stat. $stat is never populated correctly as it needs $8, not $7.

You can use the branch reset pattern that uses the same capture number for all alternatives:

(?|(?:\S+ )?\[(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\]|(\S+)\.)
  ^

Or, use the original regex and populate two variables:

    my ($month, $day, $time, $id, $addy, $relay, $relay_alt, $stat) = $line =~ m/$topat/;
    $relay //= $relay_alt;

Upvotes: 3

Related Questions