Jill448
Jill448

Reputation: 1793

perl hash of arrays issue

I have few lines in my array @lines in which * shows me the start time of a command (like sync/fetch) and the line with same processID pid and the command without * shows me the end time. They may not be continuous always. I would like to get the startdate and enddate of a particular processID and cmd. Like for usera the cmd sync with processID 11859 started at 2015/01/13 13:53:01.491-05:00 and ended at 2015/01/13 13:55:01.492-05:00

Below is my approach in which I took a hash of array and used processID as key and did split the lines. This works fine only when the start and end lines of a command are continuous , but how can I make it work even when they are not continuous.

my %users;
foreach my $line (@lines) {
   if ($line =~ m{(\*)+}) {
        ($stdate, $sttime, $pid, $user, $cmd) = split ' ',   $line;
        $startdate ="$stdate $sttime";
   }
   else {
     ($eddate, $edtime, $pid, $user, $cmd) = split ' ',   $line;
     $enddate = "$eddate $edtime";
   }           

       $users{$pid} = [ $startdate, $enddate, $user, $cmd ];

    }

Content in @lines:

2015/01/13 13:53:01.491-05:00 11859 usera       *sync_cmd 7f1f9bfff700 10.101.17.111      
2015/01/13 13:57:02.079-05:00 11863 userb       *fetch_cmd 7f1f9bfff700 10.101.17.111
2015/01/13 13:59:02.079-05:00 11863 userb       fetch_cmd 7f1f9bfff700 10.101.17.111
2015/01/13 13:55:01.492-05:00 11859 usera       sync_cmd 7f1f9bfff700 10.101.17.111 

Upvotes: 0

Views: 52

Answers (2)

Sobrique
Sobrique

Reputation: 53478

I'm looking at your code and wondering why you're using a hash of arrays.

As far as I'm concerned, the purpose of array is a set of similar but ordered values.

Could you not instead do:

my %processes;

foreach (@lines) {
    my ( $date, $time, $pid, $user, $cmd, @everything_else ) = split;

    if ( $cmd =~ m/^\*/ ) {

        #if command starts with a * - it started.
        if ( defined $processes{$pid} ) {
            print "WARNING: $pid reused\n";
        }

        $processes{$pid}{'start_date'} = $date;
        $processes{$pid}{'time'}       = $time;
        $processes{$pid}{'user'}       = $user;
        $processes{$pid}{'cmd'}        = $cmd;
    }
    else {
        #cmd does not start with '*'.
        if ( $processes{$pid}{'cmd'} =~ m/$cmd/ ) {

            #this works, because 'some_command' is a substring of '*some_command'.
            $processes{$pid}{'end_date'} = $date;
            $processes{$pid}{'end_time'} = $time;
        }
        else {
            print
                "WARNING: $pid has a command of $cmd, where it started with $processes{$pid}{'cmd'}\n";
        }
    }
}

You might want some additional validation tests in case you've got e.g. a long enough log that pids get reused, or e.g. you've got a log that doesn't include both start and finish of a particular process.

Upvotes: 2

tjd
tjd

Reputation: 4104

When you assign to %users{$pid} you are presuming that the most recent $startdate and $enddate are both relevant. This problem is exacerbated by the fact that your variables that hold your field values have a scope larger than the foreach loop, allowing these values to bleed between records.

In the if block, you should assign the values of $startdate, $user, $cmd to the array. Individually or as a slice if you like. In the else block you should assign $enddate to it's element in the array.

Regex extra credit: You don't seem to really care if there is more that one * in a record, making the + in the regex superfluous. As an added bonus, without it the capturing group is also of no value. m{\*} should do quite nicely.

Upvotes: 1

Related Questions