vkk05
vkk05

Reputation: 3222

Perl increase performance while doing File IO

I have a script which takes list of logs from Log directory and do some operation.

I have a performance issue over here while doing these operation. Because I have large set of log files and need to do operation on each of the files. Currently my script is set in cron and its been running each hour. So I want to rewrite this file reading logic(@content) or need to increase the performance of this script which can do file operation faster than the current process.

Here is the script:

#/usr/bin/perl

use strict;
use warnings;
.
.

my $LogPath = "/path/to/log/file/";

my $list = `ls -1 $LogPath*.log`;

my @array_list = split(/\n/, $list);

foreach $file (@array_list){ 
    my $cat = `cat $file`;

    my @content = split(/\n/, $cat);

    foreach $line (@content) {
        ....
        #Doing some operation if the matching content found
        ....
        ....
    }
}

Any suggestions to modify this logic so that I can read each line of the each log file would be highly appreciated.

Upvotes: 0

Views: 675

Answers (3)

Radek Zika
Radek Zika

Reputation: 11

You may consider/test using module

File::Slurp (https://metacpan.org/pod/File::Slurp)

Seriously, don't use external command call in loop - that kills performance!

Slurp is a convenient way how to get text-file data into a variable. Depending of expected memory usage buf_ref may be your option of choice for read_file method. Combined with fork suggestion above, you can pretty speedup your log ingestion.

Upvotes: 0

Kjetil S.
Kjetil S.

Reputation: 3777

If you're doing some filtering, which your comment in the innermost foreach suggests, and you're ignoring most of the lines in the logs, then you could try to replace my $cat = `cat $file`; with my $cat = `grep PATTERN $file`; to at least alleviate Perl's memory if the files are big. Maybe even so big they're causing disk swapping from not having enough memory, which then is your real problem with your perl code. In many if not most versions of grep, PATTERN can also be a perl type regexp with the -P option: grep -P 'REGEXP' file.

If the slowness is say 99% IO (disk reads and/or writes, which you can find out by time perl script.pl and see if real from the output of time is much larger than the other ones), then there probably isn't much you can do, except if your system can make compressed log files. Sometimes if you got a slow disk, maybe a network mounted disk, and fast CPUs, decompressing+processing can be faster than just processing uncompressed files. Perhaps like this: my $cat = ` zcat $file.gz | grep PATTERN `;

Also you could try parallelizing with fork by adding this outer for-loop:

my $LogPath = "/path/to/log/file";
my $list = `ls -1 $LogPath/*.log`;
my $jobs=4; #split work into 4 jobs (splits work to up to 4 CPUs)
for my $job (1..$jobs){
  next if !fork;
  my $i=0;
  my @array_list = grep $i++ % $jobs == $job-1, #do only what this process should
                   split(/\n/, $list);
  foreach $file (@array_list){
    my $cat = `cat $file`;
    my @content = split(/\n/, $cat);
    foreach $line (@content) {
      ....
      #Doing some operation if the matching content found
      ....
      ....
    }
  }
  last;
}

(By the way, for and foreach are synonymous, don't know why so many perl coders bother with the four last chars of foreach)

Upvotes: 0

ikegami
ikegami

Reputation: 385657

Start by using system calls instead of external programs to get the info you want.

my $log_dir_qfn = "/path/to/log/file";

my $error = 0;
for my $log_qfn (quotemeta($log_dir_qfn) . "/*.log") {
   open(my $fh, '<', $log_qfn)
      or do {
         warn("Can't open \"$log_qfn\": $!\n");
         ++$error;
         next;
      };

   while ( my $line = <$fh> ) {
      ...
   }
}

exit(1) if $error;

Not sure how much faster this is going to be. And there's not much that can be done to speed up what you are doing in the portion of the code you posted. If you want to read a file line by line, it's going to take the time it takes to read the file line by line.

Upvotes: 2

Related Questions