Reputation: 3222
I have a script which takes list of logs from Log directory and do some operation.
I have a performance issue over here while doing these operation. Because I have large set of log
files and need to do operation on each of the files. Currently my script is set in cron and its been running each hour. So I want to rewrite this file reading logic(@content
) or need to increase the performance of this script which can do file operation faster than the current process.
Here is the script:
#/usr/bin/perl
use strict;
use warnings;
.
.
my $LogPath = "/path/to/log/file/";
my $list = `ls -1 $LogPath*.log`;
my @array_list = split(/\n/, $list);
foreach $file (@array_list){
my $cat = `cat $file`;
my @content = split(/\n/, $cat);
foreach $line (@content) {
....
#Doing some operation if the matching content found
....
....
}
}
Any suggestions to modify this logic so that I can read each line of the each log file would be highly appreciated.
Upvotes: 0
Views: 675
Reputation: 11
You may consider/test using module
File::Slurp (https://metacpan.org/pod/File::Slurp)
Seriously, don't use external command call in loop - that kills performance!
Slurp is a convenient way how to get text-file data into a variable.
Depending of expected memory usage buf_ref
may be your option of choice for read_file
method. Combined with fork suggestion above, you can pretty speedup your log ingestion.
Upvotes: 0
Reputation: 3777
If you're doing some filtering, which your comment in the innermost foreach suggests, and you're ignoring most of the lines in the logs, then you could try to replace my $cat = `cat $file`;
with my $cat = `grep PATTERN $file`;
to at least alleviate Perl's memory if the files are big. Maybe even so big they're causing disk swapping from not having enough memory, which then is your real problem with your perl code. In many if not most versions of grep, PATTERN can also be a perl type regexp with the -P
option: grep -P 'REGEXP' file
.
If the slowness is say 99% IO (disk reads and/or writes, which you can find out by time perl script.pl
and see if real
from the output of time
is much larger than the other ones), then there probably isn't much you can do, except if your system can make compressed log files. Sometimes if you got a slow disk, maybe a network mounted disk, and fast CPUs, decompressing+processing can be faster than just processing uncompressed files. Perhaps like this: my $cat = ` zcat $file.gz | grep PATTERN `;
Also you could try parallelizing with fork
by adding this outer for-loop:
my $LogPath = "/path/to/log/file";
my $list = `ls -1 $LogPath/*.log`;
my $jobs=4; #split work into 4 jobs (splits work to up to 4 CPUs)
for my $job (1..$jobs){
next if !fork;
my $i=0;
my @array_list = grep $i++ % $jobs == $job-1, #do only what this process should
split(/\n/, $list);
foreach $file (@array_list){
my $cat = `cat $file`;
my @content = split(/\n/, $cat);
foreach $line (@content) {
....
#Doing some operation if the matching content found
....
....
}
}
last;
}
(By the way, for and foreach are synonymous, don't know why so many perl coders bother with the four last chars of foreach)
Upvotes: 0
Reputation: 385657
Start by using system calls instead of external programs to get the info you want.
my $log_dir_qfn = "/path/to/log/file";
my $error = 0;
for my $log_qfn (quotemeta($log_dir_qfn) . "/*.log") {
open(my $fh, '<', $log_qfn)
or do {
warn("Can't open \"$log_qfn\": $!\n");
++$error;
next;
};
while ( my $line = <$fh> ) {
...
}
}
exit(1) if $error;
Not sure how much faster this is going to be. And there's not much that can be done to speed up what you are doing in the portion of the code you posted. If you want to read a file line by line, it's going to take the time it takes to read the file line by line.
Upvotes: 2