MrTunaDeluxe
MrTunaDeluxe

Reputation: 152

Open the latest log file and print lines later than a certain timestamp

I'm writing a Perl script and I need to capture some lines from a garbage collection log and write them to a file.

The log is located on a remote host and I'm connecting using the Net::OpenSSH module.

I need to read the latest log file available.

In the shell I can locate the latest log with the following commands:

cd builds/5.7.1/5.7.1.126WRF_B/jboss-4.2.3/bin
ls -lat | grep '.log$' | tail -1

Which will return the latest log:

-rw-r--r--   1 load     other    2406173 Jul 11 11:53 18156.stdout.log

So in Perl I'd like to be able write something that locates and opens that log for reading.

When I have that log file, I want to print all lines that have a timestamp greater than a specified time. The specified timestamp is a $Runtime variable subtracted from the latest log message time.

Here are the last messages of the garbage collection log:

                                      ...

73868.629: [GC [PSYoungGen: 941984K->14720K(985216K)] 2118109K->1191269K(3065984K), 0.2593295 secs] [Times: user=0.62 sys=0.00, real=0.26 secs]
73873.053: [GC [PSYoungGen: 945582K->12162K(989248K)] 2122231K->1189934K(3070016K), 0.2329005 secs] [Times: user=0.60 sys=0.01, real=0.23 secs]

So if $Runtime had a value of 120 seconds, I would need to print all the lines from timestamp (73873.053 - 120) seconds up.

In the end my script would look something like this...

open GARB, ">", "./report/archive/test-$now/GC.txt" or die "Unable to create file: $!";

my $ssh2 = Net::OpenSSH->(
  $pathHost,
  user => $pathUser,
  password => $pathPassword
);
$ssh2->error and die "Couldn't establish SSH connection: ". $ssh2->error; 

# Something to find and open the log file.
print GARB #Something to return certain lines.
close GARB;

I realize this is somewhat similar to this question, but I can't think of a way to tailor it to what I'm looking for. Any help is greatly appreciated!

Upvotes: 2

Views: 1135

Answers (5)

salva
salva

Reputation: 10234

Use SFTP to access the remote filesystem. You can use Net::SFTP::Foreign (alone or via Net::OpenSSH).

It will allow you to list the contents of the remote filesystem, pick the file you want to process, open it and manipulate it as a local file.

The only tricky thing you would need to do is to read lines backward, for instance reading chunks of the file starting from the end and breaking them in lines.

Upvotes: 0

Axeman
Axeman

Reputation: 29854

I think that the page for Net::OpenSSH gives a pretty good baseline for this:

my ($rout, $pid) = $ssh->pipe_out("cat /tmp/foo") or
  die "pipe_out method failed: " . $ssh->error;

while (<$rout>) { print }
close $rout;

But instead, you want to do some discarding work:

my ($rout, $pid) = $ssh->pipe_out("cat /tmp/foo") or
  die "pipe_out method failed: " . $ssh->error;

my $line;
while (   $line = <$rout> 
      and substr( $line, 0, index( $line, ':' )) < $start 
      ) {}
while (   $line = <$rout> 
      and substr( $line, 0, index( $line, ':' )) <= $start + $duration 
      ) {
    print $line;
}
close $rout;    

Upvotes: 2

zostay
zostay

Reputation: 3995

You will need to either keep an accumulator containing all the lines (more memory) or iterate through the log more than once (more time).

With an accumulator:

my @accumulated_lines;
while (<$log_fh>) {
    push @accumulated_lines, $_;

    # Your processing to get $Runtime goes here...

    if ($Runtime > $TOO_BIG) {
        my ($current_timestamp) = /^(\d+(?:\.\d*))/;
        my $start_timestamp = $current_timestamp - $Runtime;

        for my $previous_line (@accumulated_lines) {
            my ($previous_timestamp) = /^(\d+(?:\.\d*))/;
            next unless $previous_timestamp <= $current_timestamp;
            next unless $previous_timestamp >= $start_timestamp;
            print $previous_line;
        }
    }
}

Or you can iterate through the log twice, which is similar, but without the nested loop. I've assumed you might have more than one of these spans in your log.

my @report_spans;
while (<$log_fh>) {
    push @accumulated_lines, $_;

    # Your processing to get $Runtime goes here...

    if ($Runtime > $TOO_BIG) {
        my ($current_timestamp) = /^(\d+(?:\.\d*))/;
        my $start_timestamp = $current_timestamp - $Runtime;

        push @report_spans, [ $start_timestamp, $current_timestamp ];
    }
}

# Don't bother continuing if there's nothing to report
exit 0 unless @report_spans;

# Start over
seek $log_fh, 0, 0;

while (<$log_fh>) {
    my ($previous_timestamp) = /^(\d+(?:\.\d*))/;
    SPAN: for my $span (@report_spans) {
        my ($start_timestamp, $current_timestamp) = @$span;

        next unless $previous_timestamp <= $current_timestamp;
        next unless $previous_timestamp >= $start_timestamp;
        print; # same as print $_;

        last SPAN; # don't print out the line more than once, if that's even possible
    }
}

If you might have overlapping spans, the latter has the advantage of not showing the same log lines twice. If you don't have overlapping spans, you could optimize the top one by resetting the accumulator every time you output:

my @accumulator = ();

which would save memory.

Upvotes: 1

simbabque
simbabque

Reputation: 54323

Here's an untested approach. I've not used Net::OpenSSH so there might be better ways to do it. I'm not even sure it works. What does work is the parsing part which I have tested.

use strict; use warnings;
use Net::OpenSSH;

my $Runtime = 120;
my $now = time;
open my $garb, '>', 
  "./report/archive/test-$now/GC.txt" or die "Unable to create file: $!";
my $ssh2 = Net::OpenSSH->(
$pathHost,
  user => $pathUser,
  password => $pathPassword
);
$ssh2->error and die "Couldn't establish SSH connection: ". $ssh2->error;   

# Something to find and open the log file.
my $fileCapture = $ssh2->capture(
  q~ls -lat builds/5.7.1/5.7.1.126WRF_B/jboss-4.2.3/bin |grep '.log$' |tail -1~
);
$fileCapture =~ m/\s(.+?)$/; # Look for the file name
my $filename = $1;           # And save it in $filename

# Find the time of the last log line 
my $latestTimeCapture = $ssh2->capture(
  "tail -n 1 builds/5.7.1/5.7.1.126WRF_B/jboss-4.2.3/bin/$filename");
$latestTimeCapture =~ m/^([\d\.]+):/;
my $logTime = $1 - $Runtime;

my ($in, $out, $pid) = $ssh2->open2(
  "builds/5.7.1/5.7.1.126WRF_B/jboss-4.2.3/bin/$filename");
while (<$in>) {
  # Something to return certain lines.
  if (m/^([\d\.]+):/ && $1 > $logTime) {
    print $garb $_; # Assume the \n is still in there
  }
}

waitpid($pid);

print $garb;
close $garb;

It uses your ls line to look up the file with the capture method. It then opens a pipe through the SSH tunnel to read that file. $in is a filehandle to that pipe which we can read.

Since we are going to process the file line by line, starting at the top, we need to first grab the last line to get the last timestamp. That is done with tail and, again, the capture method.

Once we have that, we read from the pipe line by line. This now is a simple regex (the same used above). Grab the timestamp and compare it to the time we have set earlier (minus the 120 seconds). If it is higher, print the line to the output filehandle.

The docs say we have to use waitpid on the $pid returned from $ssh2->open2 so it reaps the subprocess, so we do that before closing our output file.

Upvotes: 1

Len Jaffe
Len Jaffe

Reputation: 3484

Find the latest file and feed it to perl:

 LOGFILE=`ls -t1 $DIR | grep '.log$' | head -1`
 if [ -z $LOGFILE ]; then
   echo "$0: No log file found - exiting"
   exit 1;
 fi

 perl myscript.pl $LOGFILE

The pipe in the first line lists the file in the directory, name-only, in one column, most recent first; filters for log files, and then only returns the first one.

I have no idea how to translate your timestamps into something I can understand and do math and comparisons upon. but in general:

$threshold_ts = $time_specified - $offset;
while (<>) {
  my ($line_ts) = split(/\s/, $_, 2);
  print if compare_time_stamps($line_ts, $threshold_ts);
}

Writing the threshold manipulation and comparison is left as an exercise for the reader.

Upvotes: 2

Related Questions