Reputation: 15359

How can I remove an element from a Perl array after I've processed it?

I am reading a postfix mail log file into an array and then looping through it to extract messages. On the first pass, I'm checking for a match on the "to=" line and grabbing the message ID. After building an array of MSGIDs, I'm looping back through the array to extract information on the to=, from=, and client= lines.

What I'd like to do is remove a line from the array as soon as I've extracted the data from it in order to make the processing a bit faster (i.e. one less line to check against).

Any suggestions? This is in Perl.

Edit: gbacon's answer below was enough to get me rolling with a solid solution. Here's the guts of it:

my %msg;
while (<>) {
    my $line = $_;
    if (s!^.*postfix/\w+\[.+?\]: (\w+):\s*!!) {
            my $key = $1;
            push @{ $msg{$key}{$1} } => $2
                    while /\b(to|from|client|size|nrcpt)=<?(.+?)(?:>|,|\[|$)/g;
    }
    if ($line =~ s!^(\w+ \d+ \d+:\d+:\d+)\s(\w+.*)\s+postfix/\w+\[.+?\]: (\w+):\s*removed!!) {
            my $key = $3;
            push @{ $msg{$key}{date} } => $1;
            push @{ $msg{$key}{server} } => $2;
    }
}

use Data::Dumper;
$Data::Dumper::Indent = 1;
print Dumper \%msg;

I'm sure that second regexp can be made more impressive, but it gets the job done for what I need. I can now take the hash of all messages and pull out the ones I'm interested in.

Thanks to all who answered.

Upvotes: 3

Answers (6)

Greg Bacon

Reputation: 139711

Do it in a single pass:

#! /usr/bin/perl

use warnings;
use strict;

# for demo only
*ARGV = *DATA;

my %msg;
while (<>) {
  if (s!^.*postfix/\w+\[.+?\]: (\w+):\s*!!) {
    my $key = $1;
    push @{ $msg{$key}{$1} } => $2
      while /\b(to|from|client)=(.+?)(?:,|$)/g;
  }
}

use Data::Dumper;
$Data::Dumper::Indent = 1;
print Dumper \%msg;
__DATA__
Apr  8 14:22:02 MailSecure03 postfix/smtpd[32388]: BA1CE38965: client=mail.example.com[x.x.x.x]
Apr  8 14:22:03 MailSecure03 postfix/cleanup[32070]: BA1CE38965: message-id=<[email protected]>
Apr  8 14:22:03 MailSecure03 postfix/qmgr[19685]: BA1CE38965: from=<[email protected]>, size=1087, nrcpt=2 (queue active)
Apr  8 14:22:04 MailSecure03 postfix/smtp[32608]: BA1CE38965: to=<[email protected]>, relay=127.0.0.1[127.0.0.1]:10025, delay=1.7, delays=1/0/0/0.68, dsn=2.0.0, status=sent (250 OK, sent 49DC509B_360_15637_162D8438973)
Apr  8 14:22:04 MailSecure03 postfix/smtp[32608]: BA1CE38965: to=<[email protected]>, relay=127.0.0.1[127.0.0.1]:10025, delay=1.7, delays=1/0/0/0.68, dsn=2.0.0, status=sent (250 OK, sent 49DC509B_360_15637_162D8438973)
Apr  8 14:22:04 MailSecure03 postfix/qmgr[19685]: BA1CE38965: removed
Apr  8 14:22:04 MailSecure03 postfix/smtpd[32589]: 62D8438973: client=localhost.localdomain[127.0.0.1]
Apr  8 14:22:04 MailSecure03 postfix/cleanup[32080]: 62D8438973: message-id=<[email protected]>
Apr  8 14:22:04 MailSecure03 postfix/qmgr[19685]: 62D8438973: from=<[email protected]>, size=1636, nrcpt=2 (queue active)
Apr  8 14:22:04 MailSecure03 postfix/smtp[32417]: 62D8438973: to=<[email protected]>, relay=y.y.y.y[y.y.y.y]:25, delay=0.19, delays=0.04/0/0.04/0.1, dsn=2.6.0, status=sent (250 2.6.0  <[email protected]> Queued mail for delivery)
Apr  8 14:22:04 MailSecure03 postfix/smtp[32417]: 62D8438973: to=<[email protected]>, relay=y.y.y.y[y.y.y.y]:25, delay=0.19, delays=0.04/0/0.04/0.1, dsn=2.6.0, status=sent (250 2.6.0  <[email protected]> Queued mail for delivery)
Apr  8 14:22:04 MailSecure03 postfix/qmgr[19685]: 62D8438973: removed

The code works by first looking for a queue ID (e.g., BA1CE38965 and 62D8438973 above), which we store in $key.

Next, we find all matches on the current line (thanks to the /g switch) that look like to=<...>, client=mail.example.com, and so on—with and without the separating comma.

Of note in the pattern are

\b - matches on a word boundary only (prevents matching xxxto=<...>)
(to|from|client) - match to or from or client
(.+?) - matches the field's value with a non-greedy quantifier
(?:,|$) - matches either a comma or at end of string without capturing into $3

The non-greedy (.+?) forces the match to stop at the first comma it encounters rather than the last. Otherwise, on a line with

to=<[email protected]>, other=123

you'd get <[email protected]>, other=123 as the recipient!

Then for each field matched, we push it onto the end of an array (because there may be multiple recipients, for example) connected to both the queue ID and field name. Take a look at the result:

$VAR1 = {
  '62D8438973' => {
    'client' => [
      'localhost.localdomain[127.0.0.1]'
    ],
    'to' => [
      '<[email protected]>',
      '<[email protected]>'
    ],
    'from' => [
      '<[email protected]>'
    ]
  },
  'BA1CE38965' => {
    'client' => [
      'mail.example.com[x.x.x.x]'
    ],
    'to' => [
      '<[email protected]>',
      '<[email protected]>'
    ],
    'from' => [
      '<[email protected]>'
    ]
  }
};

Now say you want to print all the recipients of the message whose queue ID is BA1CE38965:

my $queueid = "BA1CE38965";
foreach my $recip (@{ $msg{$queueid}{to} }) {
  print $recip, "\n":
}

Maybe you want to know only how many recipients:

print scalar @{ $msg{$queueid}{to} }, "\n";

If you're willing to assume each message has exactly one client, access it with

print $msg{$queueid}{client}[0], "\n";

Upvotes: 5

Ether

Reputation: 54014

Common methods for manipulating the contents of an array:

# start over with this list for each example:
my @list = qw(a b c d);

splice:

splice @list, 2, 1, qw(e);
# @list now contains: qw(a b e d)

pop and unshift:

pop @list;
# @list now contains: qw(a b c)

unshift @list;
# @list now contains: qw(b c d)

map:

@list = map { $_ eq 'b' ? () : $_ } @list;
# list now contains: qw(a c d);

array slices:

@list[3..4] = qw(e f);
# list now contais: qw(a b c e f);

for and foreach loops:

foreach (@list)
{
    # $_ is aliased to each element of the list in turn;
    # assignments will be propogated back to the original structure
    $_ = uc if m/[a-c]/;
}
# list now contains: qw(A B C d);

Read about all these functions at perldoc perlfunc, slices in perldoc perldata, and for loops in perldoc perlsyn.

Upvotes: 0

daotoad

Reputation: 27234

Why not do this:

my @extracted = map  extract_data($_), 
                grep msg_rcpt_to( $rcpt, $_ ), @log_data;

When you are done, you'll have an array of extracted data in the same order it appeared in the log.

Upvotes: 1

Vivin Paliath

Reputation: 95598

Assuming you have the index at hand, use splice:

splice(@array, $indextoremove, 1)

But be careful. Your index will be invalid once you remove an element.

Upvotes: 0

Ken Aspeslagh

Reputation: 11594

In perl you can use the splice() routine to remove elements from an array.

As usual, use caution when deleting from an array when looping through an array as your array indexes will change.

Upvotes: 0

Eli Bendersky

Reputation: 273854

It won't actually make the processing faster, as removing from the middle of an array is an expensive operation.

Better options:

Do everything in one pass
When you build the array of IDs, include pointers (indexes, really) into the main array so that you can access its elements quickly for a given ID

Upvotes: 4

How can I remove an element from a Perl array after I&#39;ve processed it?

Answers (6)

Related Questions

How can I remove an element from a Perl array after I've processed it?