Reputation: 152
I'm writing a Perl script and I need to capture some lines from a garbage collection log and write them to a file.
The log is located on a remote host and I'm connecting using the Net::OpenSSH
module.
I need to read the latest log file available.
In the shell I can locate the latest log with the following commands:
cd builds/5.7.1/5.7.1.126WRF_B/jboss-4.2.3/bin
ls -lat | grep '.log$' | tail -1
Which will return the latest log:
-rw-r--r-- 1 load other 2406173 Jul 11 11:53 18156.stdout.log
So in Perl I'd like to be able write something that locates and opens that log for reading.
When I have that log file, I want to print all lines that have a timestamp greater than a specified time. The specified timestamp is a $Runtime
variable subtracted from the latest log message time.
Here are the last messages of the garbage collection log:
...
73868.629: [GC [PSYoungGen: 941984K->14720K(985216K)] 2118109K->1191269K(3065984K), 0.2593295 secs] [Times: user=0.62 sys=0.00, real=0.26 secs]
73873.053: [GC [PSYoungGen: 945582K->12162K(989248K)] 2122231K->1189934K(3070016K), 0.2329005 secs] [Times: user=0.60 sys=0.01, real=0.23 secs]
So if $Runtime
had a value of 120 seconds, I would need to print all the lines from timestamp (73873.053 - 120) seconds up.
In the end my script would look something like this...
open GARB, ">", "./report/archive/test-$now/GC.txt" or die "Unable to create file: $!";
my $ssh2 = Net::OpenSSH->(
$pathHost,
user => $pathUser,
password => $pathPassword
);
$ssh2->error and die "Couldn't establish SSH connection: ". $ssh2->error;
# Something to find and open the log file.
print GARB #Something to return certain lines.
close GARB;
I realize this is somewhat similar to this question, but I can't think of a way to tailor it to what I'm looking for. Any help is greatly appreciated!
Upvotes: 2
Views: 1135
Reputation: 10234
Use SFTP to access the remote filesystem. You can use Net::SFTP::Foreign (alone or via Net::OpenSSH).
It will allow you to list the contents of the remote filesystem, pick the file you want to process, open it and manipulate it as a local file.
The only tricky thing you would need to do is to read lines backward, for instance reading chunks of the file starting from the end and breaking them in lines.
Upvotes: 0
Reputation: 29854
I think that the page for Net::OpenSSH
gives a pretty good baseline for this:
my ($rout, $pid) = $ssh->pipe_out("cat /tmp/foo") or
die "pipe_out method failed: " . $ssh->error;
while (<$rout>) { print }
close $rout;
But instead, you want to do some discarding work:
my ($rout, $pid) = $ssh->pipe_out("cat /tmp/foo") or
die "pipe_out method failed: " . $ssh->error;
my $line;
while ( $line = <$rout>
and substr( $line, 0, index( $line, ':' )) < $start
) {}
while ( $line = <$rout>
and substr( $line, 0, index( $line, ':' )) <= $start + $duration
) {
print $line;
}
close $rout;
Upvotes: 2
Reputation: 3995
You will need to either keep an accumulator containing all the lines (more memory) or iterate through the log more than once (more time).
With an accumulator:
my @accumulated_lines;
while (<$log_fh>) {
push @accumulated_lines, $_;
# Your processing to get $Runtime goes here...
if ($Runtime > $TOO_BIG) {
my ($current_timestamp) = /^(\d+(?:\.\d*))/;
my $start_timestamp = $current_timestamp - $Runtime;
for my $previous_line (@accumulated_lines) {
my ($previous_timestamp) = /^(\d+(?:\.\d*))/;
next unless $previous_timestamp <= $current_timestamp;
next unless $previous_timestamp >= $start_timestamp;
print $previous_line;
}
}
}
Or you can iterate through the log twice, which is similar, but without the nested loop. I've assumed you might have more than one of these spans in your log.
my @report_spans;
while (<$log_fh>) {
push @accumulated_lines, $_;
# Your processing to get $Runtime goes here...
if ($Runtime > $TOO_BIG) {
my ($current_timestamp) = /^(\d+(?:\.\d*))/;
my $start_timestamp = $current_timestamp - $Runtime;
push @report_spans, [ $start_timestamp, $current_timestamp ];
}
}
# Don't bother continuing if there's nothing to report
exit 0 unless @report_spans;
# Start over
seek $log_fh, 0, 0;
while (<$log_fh>) {
my ($previous_timestamp) = /^(\d+(?:\.\d*))/;
SPAN: for my $span (@report_spans) {
my ($start_timestamp, $current_timestamp) = @$span;
next unless $previous_timestamp <= $current_timestamp;
next unless $previous_timestamp >= $start_timestamp;
print; # same as print $_;
last SPAN; # don't print out the line more than once, if that's even possible
}
}
If you might have overlapping spans, the latter has the advantage of not showing the same log lines twice. If you don't have overlapping spans, you could optimize the top one by resetting the accumulator every time you output:
my @accumulator = ();
which would save memory.
Upvotes: 1
Reputation: 54323
Here's an untested approach. I've not used Net::OpenSSH
so there might be better ways to do it. I'm not even sure it works. What does work is the parsing part which I have tested.
use strict; use warnings;
use Net::OpenSSH;
my $Runtime = 120;
my $now = time;
open my $garb, '>',
"./report/archive/test-$now/GC.txt" or die "Unable to create file: $!";
my $ssh2 = Net::OpenSSH->(
$pathHost,
user => $pathUser,
password => $pathPassword
);
$ssh2->error and die "Couldn't establish SSH connection: ". $ssh2->error;
# Something to find and open the log file.
my $fileCapture = $ssh2->capture(
q~ls -lat builds/5.7.1/5.7.1.126WRF_B/jboss-4.2.3/bin |grep '.log$' |tail -1~
);
$fileCapture =~ m/\s(.+?)$/; # Look for the file name
my $filename = $1; # And save it in $filename
# Find the time of the last log line
my $latestTimeCapture = $ssh2->capture(
"tail -n 1 builds/5.7.1/5.7.1.126WRF_B/jboss-4.2.3/bin/$filename");
$latestTimeCapture =~ m/^([\d\.]+):/;
my $logTime = $1 - $Runtime;
my ($in, $out, $pid) = $ssh2->open2(
"builds/5.7.1/5.7.1.126WRF_B/jboss-4.2.3/bin/$filename");
while (<$in>) {
# Something to return certain lines.
if (m/^([\d\.]+):/ && $1 > $logTime) {
print $garb $_; # Assume the \n is still in there
}
}
waitpid($pid);
print $garb;
close $garb;
It uses your ls
line to look up the file with the capture
method. It then opens a pipe through the SSH tunnel to read that file. $in
is a filehandle to that pipe which we can read.
Since we are going to process the file line by line, starting at the top, we need to first grab the last line to get the last timestamp. That is done with tail
and, again, the capture
method.
Once we have that, we read from the pipe line by line. This now is a simple regex (the same used above). Grab the timestamp and compare it to the time we have set earlier (minus the 120 seconds). If it is higher, print
the line to the output filehandle.
The docs say we have to use waitpid
on the $pid
returned from $ssh2->open2
so it reaps the subprocess, so we do that before closing our output file.
Upvotes: 1
Reputation: 3484
Find the latest file and feed it to perl:
LOGFILE=`ls -t1 $DIR | grep '.log$' | head -1`
if [ -z $LOGFILE ]; then
echo "$0: No log file found - exiting"
exit 1;
fi
perl myscript.pl $LOGFILE
The pipe in the first line lists the file in the directory, name-only, in one column, most recent first; filters for log files, and then only returns the first one.
I have no idea how to translate your timestamps into something I can understand and do math and comparisons upon. but in general:
$threshold_ts = $time_specified - $offset;
while (<>) {
my ($line_ts) = split(/\s/, $_, 2);
print if compare_time_stamps($line_ts, $threshold_ts);
}
Writing the threshold manipulation and comparison is left as an exercise for the reader.
Upvotes: 2