Reputation: 1793
I have few lines in my array @lines
in which *
shows me the start time of a command (like sync/fetch) and the line with same processID pid
and the command without *
shows me the end time. They may not be continuous always. I would like to get the startdate
and enddate
of a particular processID
and cmd
. Like for usera
the cmd sync
with processID 11859
started at 2015/01/13 13:53:01.491-05:00
and ended at 2015/01/13 13:55:01.492-05:00
Below is my approach in which I took a hash of array and used processID
as key and did split the lines. This works fine only when the start and end lines of a command are continuous , but how can I make it work even when they are not continuous.
my %users;
foreach my $line (@lines) {
if ($line =~ m{(\*)+}) {
($stdate, $sttime, $pid, $user, $cmd) = split ' ', $line;
$startdate ="$stdate $sttime";
}
else {
($eddate, $edtime, $pid, $user, $cmd) = split ' ', $line;
$enddate = "$eddate $edtime";
}
$users{$pid} = [ $startdate, $enddate, $user, $cmd ];
}
Content in @lines
:
2015/01/13 13:53:01.491-05:00 11859 usera *sync_cmd 7f1f9bfff700 10.101.17.111
2015/01/13 13:57:02.079-05:00 11863 userb *fetch_cmd 7f1f9bfff700 10.101.17.111
2015/01/13 13:59:02.079-05:00 11863 userb fetch_cmd 7f1f9bfff700 10.101.17.111
2015/01/13 13:55:01.492-05:00 11859 usera sync_cmd 7f1f9bfff700 10.101.17.111
Upvotes: 0
Views: 52
Reputation: 53478
I'm looking at your code and wondering why you're using a hash of arrays.
As far as I'm concerned, the purpose of array is a set of similar but ordered values.
Could you not instead do:
my %processes;
foreach (@lines) {
my ( $date, $time, $pid, $user, $cmd, @everything_else ) = split;
if ( $cmd =~ m/^\*/ ) {
#if command starts with a * - it started.
if ( defined $processes{$pid} ) {
print "WARNING: $pid reused\n";
}
$processes{$pid}{'start_date'} = $date;
$processes{$pid}{'time'} = $time;
$processes{$pid}{'user'} = $user;
$processes{$pid}{'cmd'} = $cmd;
}
else {
#cmd does not start with '*'.
if ( $processes{$pid}{'cmd'} =~ m/$cmd/ ) {
#this works, because 'some_command' is a substring of '*some_command'.
$processes{$pid}{'end_date'} = $date;
$processes{$pid}{'end_time'} = $time;
}
else {
print
"WARNING: $pid has a command of $cmd, where it started with $processes{$pid}{'cmd'}\n";
}
}
}
You might want some additional validation tests in case you've got e.g. a long enough log that pids get reused, or e.g. you've got a log that doesn't include both start and finish of a particular process.
Upvotes: 2
Reputation: 4104
When you assign to %users{$pid}
you are presuming that the most recent $startdate
and $enddate
are both relevant. This problem is exacerbated by the fact that your variables that hold your field values have a scope larger than the foreach
loop, allowing these values to bleed between records.
In the if
block, you should assign the values of $startdate, $user, $cmd
to the array. Individually or as a slice if you like. In the else
block you should assign $enddate
to it's element in the array.
Regex extra credit: You don't seem to really care if there is more that one *
in a record, making the +
in the regex superfluous. As an added bonus, without it the capturing group is also of no value. m{\*}
should do quite nicely.
Upvotes: 1