Reputation: 15359
I am reading a postfix mail log file into an array and then looping through it to extract messages. On the first pass, I'm checking for a match on the "to=" line and grabbing the message ID. After building an array of MSGIDs, I'm looping back through the array to extract information on the to=, from=, and client= lines.
What I'd like to do is remove a line from the array as soon as I've extracted the data from it in order to make the processing a bit faster (i.e. one less line to check against).
Any suggestions? This is in Perl.
Edit: gbacon's answer below was enough to get me rolling with a solid solution. Here's the guts of it:
my %msg;
while (<>) {
my $line = $_;
if (s!^.*postfix/\w+\[.+?\]: (\w+):\s*!!) {
my $key = $1;
push @{ $msg{$key}{$1} } => $2
while /\b(to|from|client|size|nrcpt)=<?(.+?)(?:>|,|\[|$)/g;
}
if ($line =~ s!^(\w+ \d+ \d+:\d+:\d+)\s(\w+.*)\s+postfix/\w+\[.+?\]: (\w+):\s*removed!!) {
my $key = $3;
push @{ $msg{$key}{date} } => $1;
push @{ $msg{$key}{server} } => $2;
}
}
use Data::Dumper;
$Data::Dumper::Indent = 1;
print Dumper \%msg;
I'm sure that second regexp can be made more impressive, but it gets the job done for what I need. I can now take the hash of all messages and pull out the ones I'm interested in.
Thanks to all who answered.
Upvotes: 3
Views: 1366
Reputation: 139711
Do it in a single pass:
#! /usr/bin/perl
use warnings;
use strict;
# for demo only
*ARGV = *DATA;
my %msg;
while (<>) {
if (s!^.*postfix/\w+\[.+?\]: (\w+):\s*!!) {
my $key = $1;
push @{ $msg{$key}{$1} } => $2
while /\b(to|from|client)=(.+?)(?:,|$)/g;
}
}
use Data::Dumper;
$Data::Dumper::Indent = 1;
print Dumper \%msg;
__DATA__
Apr 8 14:22:02 MailSecure03 postfix/smtpd[32388]: BA1CE38965: client=mail.example.com[x.x.x.x]
Apr 8 14:22:03 MailSecure03 postfix/cleanup[32070]: BA1CE38965: message-id=<[email protected]>
Apr 8 14:22:03 MailSecure03 postfix/qmgr[19685]: BA1CE38965: from=<[email protected]>, size=1087, nrcpt=2 (queue active)
Apr 8 14:22:04 MailSecure03 postfix/smtp[32608]: BA1CE38965: to=<[email protected]>, relay=127.0.0.1[127.0.0.1]:10025, delay=1.7, delays=1/0/0/0.68, dsn=2.0.0, status=sent (250 OK, sent 49DC509B_360_15637_162D8438973)
Apr 8 14:22:04 MailSecure03 postfix/smtp[32608]: BA1CE38965: to=<[email protected]>, relay=127.0.0.1[127.0.0.1]:10025, delay=1.7, delays=1/0/0/0.68, dsn=2.0.0, status=sent (250 OK, sent 49DC509B_360_15637_162D8438973)
Apr 8 14:22:04 MailSecure03 postfix/qmgr[19685]: BA1CE38965: removed
Apr 8 14:22:04 MailSecure03 postfix/smtpd[32589]: 62D8438973: client=localhost.localdomain[127.0.0.1]
Apr 8 14:22:04 MailSecure03 postfix/cleanup[32080]: 62D8438973: message-id=<[email protected]>
Apr 8 14:22:04 MailSecure03 postfix/qmgr[19685]: 62D8438973: from=<[email protected]>, size=1636, nrcpt=2 (queue active)
Apr 8 14:22:04 MailSecure03 postfix/smtp[32417]: 62D8438973: to=<[email protected]>, relay=y.y.y.y[y.y.y.y]:25, delay=0.19, delays=0.04/0/0.04/0.1, dsn=2.6.0, status=sent (250 2.6.0 <[email protected]> Queued mail for delivery)
Apr 8 14:22:04 MailSecure03 postfix/smtp[32417]: 62D8438973: to=<[email protected]>, relay=y.y.y.y[y.y.y.y]:25, delay=0.19, delays=0.04/0/0.04/0.1, dsn=2.6.0, status=sent (250 2.6.0 <[email protected]> Queued mail for delivery)
Apr 8 14:22:04 MailSecure03 postfix/qmgr[19685]: 62D8438973: removed
The code works by first looking for a queue ID (e.g., BA1CE38965
and 62D8438973
above), which we store in $key
.
Next, we find all matches on the current line (thanks to the /g
switch) that look like to=<...>
, client=mail.example.com
, and so on—with and without the separating comma.
Of note in the pattern are
\b
- matches on a word boundary only (prevents matching xxxto=<...>
)(to|from|client)
- match to
or from
or client
(.+?)
- matches the field's value with a non-greedy quantifier(?:,|$)
- matches either a comma or at end of string without capturing into $3
The non-greedy (.+?)
forces the match to stop at the first comma it encounters rather than the last. Otherwise, on a line with
to=<[email protected]>, other=123
you'd get <[email protected]>, other=123
as the recipient!
Then for each field matched, we push
it onto the end of an array (because there may be multiple recipients, for example) connected to both the queue ID and field name. Take a look at the result:
$VAR1 = { '62D8438973' => { 'client' => [ 'localhost.localdomain[127.0.0.1]' ], 'to' => [ '<[email protected]>', '<[email protected]>' ], 'from' => [ '<[email protected]>' ] }, 'BA1CE38965' => { 'client' => [ 'mail.example.com[x.x.x.x]' ], 'to' => [ '<[email protected]>', '<[email protected]>' ], 'from' => [ '<[email protected]>' ] } };
Now say you want to print all the recipients of the message whose queue ID is BA1CE38965
:
my $queueid = "BA1CE38965";
foreach my $recip (@{ $msg{$queueid}{to} }) {
print $recip, "\n":
}
Maybe you want to know only how many recipients:
print scalar @{ $msg{$queueid}{to} }, "\n";
If you're willing to assume each message has exactly one client, access it with
print $msg{$queueid}{client}[0], "\n";
Upvotes: 5
Reputation: 54014
Common methods for manipulating the contents of an array:
# start over with this list for each example:
my @list = qw(a b c d);
splice:
splice @list, 2, 1, qw(e);
# @list now contains: qw(a b e d)
pop and unshift:
pop @list;
# @list now contains: qw(a b c)
unshift @list;
# @list now contains: qw(b c d)
map:
@list = map { $_ eq 'b' ? () : $_ } @list;
# list now contains: qw(a c d);
array slices:
@list[3..4] = qw(e f);
# list now contais: qw(a b c e f);
for and foreach loops:
foreach (@list)
{
# $_ is aliased to each element of the list in turn;
# assignments will be propogated back to the original structure
$_ = uc if m/[a-c]/;
}
# list now contains: qw(A B C d);
Read about all these functions at perldoc perlfunc, slices in perldoc perldata, and for loops in perldoc perlsyn.
Upvotes: 0
Reputation: 27234
Why not do this:
my @extracted = map extract_data($_),
grep msg_rcpt_to( $rcpt, $_ ), @log_data;
When you are done, you'll have an array of extracted data in the same order it appeared in the log.
Upvotes: 1
Reputation: 95598
Assuming you have the index at hand, use splice:
splice(@array, $indextoremove, 1)
But be careful. Your index will be invalid once you remove an element.
Upvotes: 0
Reputation: 11594
In perl you can use the splice() routine to remove elements from an array.
As usual, use caution when deleting from an array when looping through an array as your array indexes will change.
Upvotes: 0
Reputation: 273854
It won't actually make the processing faster, as removing from the middle of an array is an expensive operation.
Better options:
Upvotes: 4