limeri
limeri

Reputation: 131

Perl creates objects very slowly

I have a perl script that reads ~50,000 rows from a database and stores them in an array of hashes. Standard DBI code. Rather than work directly on hashes, I prefer to put the data into objects that I can pass to other code modules very cleanly. The table I'm reading from has 15+ columns in it. My code basically looks like:

my $db = DBI->connect(); # Just pretend you see a proper DBI connect here
my $resultSet = $db->selectall_arrayref($sql);
$db->disconnect();

# Here's where the problem starts.
my %objects;
for my $row (@{$resultSet}) {
    my ($col1, $col2, ..., $col15) = @{$row};
    my %inputHash;
    $inputHash{col1} = $col1 if $col1;
    ...
    $inputHash{col15} = $col1 if $col15;
    my $obj = Model::Object->new(%inputHash);
    $objects{$col1} = $obj;
}
return values %objects;

It collects stuff into a hash to eliminate dups from the select. The problem starts in the loop below the comment that says "Here's where the problem starts". I've put a message in the loop to log a line for every 100 objects that are created. The first 100 objects were created in 5 secs. The next 100 took 16 secs. Getting to 300 took 30 more secs. It's up to 9000 objects and is taking 12+ minutes to create 100 objects. I didn't think that 50,000 objects was large enough to create these kinds of issues.

The Model::Object that's being created is a class with getters and setters for each of the properties. It has a new method and a serialize method (essentially a toString) and that's it. There's no logic to it.

I'm running ActiveState Perl 5.16 on a Windows laptop with 8 GB of RAM, an i7 processor (3 yrs old) and an SSD drive with reasonable space. I've seen this on a Linux machine with the same version of Perl, so I don't think it's a hardware thing. I need to stay on 5.16 of AS Perl. Any advice about how to improve performance would be appreciated. Thanks.

Upvotes: 3

Views: 278

Answers (2)

Borodin
Borodin

Reputation: 126722

As you have read, it is imperative that you use a profiler to determine where the bottlenecks are in your code before you progress far with optimising. However, as I described in my comment, it is possible to rewrite your loop differently so that unused hashes aren't unnecessarily created and discarded

You should also see an improvement from passing the hash by reference instead of as a simple list of keys and values

Here's a modification of your code that should give you some ideas

use constant COLUMN_NAMES => [ qw/
  col1  col2  col3  col4  col5
  col6  col7  col8  col9  col10
  col11 col12 col13 col14 col15 
/ ];

sub object_results {

    my $dbh = DBI->connect($dsn, $user, $pass);
    my $result_set = $dbh->selectall_arrayref($sql);
    $dbh->disconnect;

    my %objects;
    for ( my $i = $#$result_set; $i >= 0; --$i ) {
        my $row = $result_set->[$i];
        next if exists $objects{$row->[0]};

        my %input_hash;
        for my $i ( 0 .. $#$row ) {
          my $v = $row->[$i];
          next unless defined $v;
          $input_hash{COLUMN_NAMES->[$i]} = $v;
        }

        $objects{$input_hash{col1}} = Model::Object->new(\%input_hash);
    }

    values %objects;
}

Upvotes: 1

Patrick J. S.
Patrick J. S.

Reputation: 2935

First of all: Profile your program! You already have narrowed it down to one sub, with Devel::NYTProf (for example) you can narrow it down to the line that is the culprit.

Here are some general considerations from my side:

Just from glancing over it, some probable slowing factors immediately spring to mind, but you can't be sure if you don't profile your program:

Mayhe hashing-allocation takes too long. As your %objects hash grows, perl will steadily allocate more memory. You could pre-set the size of your $objects hash. This feature is documented here. Since this is a memory allocation problem, you wouldn't recognize this, if you profile with a too small data set.

# somewhere outside of the loop
keys(%objects) = $number_of_rows * 1.2;
# the hash should be a little bigger than the objects to be stored in it

Secondly, it could be that the object-creation takes too long. Take a look at Model::Object. I don't know what's in there so I can't make a comment about that. But most certainly you should consider passing the %inputHash as a reference. With Model::Object->new(%inputHash);, You put the keys and values on the stack, and then retrieve it, in the worst case as my %options = @_;. With that move, you recompute the hash for every key.

Maybe you can come up with a way to get rid of the small $inputHash completely. I quickly only can come up with some ways, that would be based on definednes, but you are checking for truthyness (are you sure that's right, btw? "0" is false, for example).

But again, most importantly: Profile your program. Maybe take a smaller data set, but you wouldn't be able to see the memory allocation problems as clearly then. But with profiling you will exactly see, at which point your program takes the most time.

The perldoc has something to say about speeding up your program. It has a nice chapter about profiling, too.

Upvotes: 5

Related Questions