user1990571
user1990571

Reputation: 9

Not able to print multiple records in less time in perl script

I have perl script which is used to display information date wise. I fetch the data according to the date and keep it as flat file which has field seperator "AaBbCc".

file name would be

18-01-13_REPORT
17-01-13_REPORT

records would be

111 AaBbCc 2222 AaBbCc 3333 AaBbCc etc(each two lines forms a single record)

etc

Each file has more than 5000 records.Using my code I am able to print records successfully. But it takes 15 minutes to print all the records. Now I want to optimise the code to print those records in less time.

Please find my code

open (FILE,"$filename/$val_REPORT_DATE") 
    or die "Could not read from $filename, program halting.";
local $/ = undef; #get the whole file in to picture
while(<FILE>) {
    chomp $_;
    @fields = split('AaBbCc', $_);
    for ( $i=0 ; $i<$count ; ) {
        print "<tr><td>" . $fields[$i+0] .
              "</td><td>". $fields[$i+1] .
              "</td><td>". $fields[$i+2] .
              "</td><td>". $fields[$i+3] .
              "</td><td>". $fields[$i+4] .
              "</td><td>". $fields[$i+5] .
              "</td><td>". $fields[$i+6] .
                           $fields[$i+7] ."</td></tr>";
        $i = $i + 8;
    }
}

Please help me to print all records in short time to increase performance

Thanks in advance !!! Vijay

Upvotes: 0

Views: 164

Answers (1)

amon
amon

Reputation: 57600

There are a few reasons why your code is slow:

  1. The file format used is braindead.
  2. For every element you look up in @fields, you perform an addition. While this wouldn't be overly costly in a low-level language, Perls Scalars are quite expensive.

    Here is one of the many parts from your opcode tree that do the array element lookups

    35               <2> aelem sK/2 ->36
    31                  <1> rv2av sKR/1 ->32        # get actual array
    30                      <#> gv[*fields] s ->31  # get global typeglob
    34                  <2> add[t63] sK/2 ->35      # create new scalar here
    -                       <1> ex-rv2sv sK/1 ->33  # get actual scalar
    32                         <#> gvsv[*i] s ->33  # get global typeglob
    33                      <$> const[IV 7] s ->34
    

    Compare this with an element lookup with lexical variables, and without the addition:

    c        <2> aelem sK/2 ->d
    a           <0> padav[@fields:47,49] sR ->b     # beautiful
    b           <0> padsv[$i:48,49] s ->c           # beautiful
    
  3. There are a few best practices that could result in more performant code, e.g. using lexical variables with my. Global variables are looked up differently, and far slower (see above). Also, there is no need to slurp in the whole file at once. Why use that much memory if it can be done in constant space?

.

#!/usr/bin/perl
use strict; use warnings; # every good script starts with these
use 5.010;

open my $file, "<", $filename or die qq(Couldn't open "$filename": $!):
until (eof $file) {
  # read two lines at a time
  my $line1 = <$file>;
  my $line2 = <$file> // die qq(uneven number of lines in "$filename");
  ...
}

Now we can fill in two possible solutions. Here is one that emphasizes dataflow programming (read from bottom upwards):

print
    "<tr>" . (
       join "" =>
       map  "<td>$_</td>",
       map  {chomp; split /AaBbCc/}
            ($line1, $line2)
    ) . "</tr>\n"
    ;

The same algorithm could be encoded as

chomp($line1, $line2);
my $string = "";
$string .= "<td>$_</td>" for split(/AaBbCc/, $line1), split(/AaBbCc/, $line2);
print "<tr>$string</tr>\n";

We could also abuse special variables:

chomp($line1, $line2);
my @fields = split(/AaBbCc/, $line1), split(/AaBbCc/, $line2);
local $" = "</td><td>"; # list seperator
print "<tr><td>@fields</td></tr>\n";

Or, without a named array:

chomp($line1, $line2);
local $" = "</td><td>";
print "<tr><td>@{[split(/AaBbCc/, $line1), split(/AaBbCc/, $line2)]}</td></tr>\n";

What I don't do is manually calculating indices, or unrolling loops.

Now while it isn't guaranteed that these variants will run faster, you have some material to experiment with. To really optimize your code, you should turn to the Devel::NYTProf profiler. It produces very detailed line-by-line reports, showing the number of times each statement was executed and how long this took on average.


Assuming that none of your fields includes tabs, this is a script to transform your data into a sane tab-seperated output:

#!/usr/bin/perl
use strict; use warnings; use feature 'say';

# usage perl convert.pl filenames...

for my $filename (@ARGV) {
  open my $oldfile, "<", $filename       or die qq(Can't open "$filename": $!);
  open my $newfile, ">", "$filename.new" or die qq(Can't open "$filename.new": $!);
  until (eof $oldfile) {
     my $line1 = <$oldfile> // die qq(unexpected eof in "$filename");
     my $line2 = <$oldfile> // die qq(unexpected eof in "$filename": uneven number of lines);
     chomp( $line1, $line2 );
     my @fields = map {split /AaBbCc/, $_, 4} $line1, $line2;
     say $newfile join "\t" => @fields;
  }
  rename $filename       => "$filename.bak" or die qq(Can't back up "$filename");
  rename "$filename.new" => $filename       or die qq(Can't replace "$filename" with transformed data);
}

Upvotes: 2

Related Questions