Mr. Llama
Mr. Llama

Reputation: 20899

Perl DBI - For loop killing performance?

I'm working on a perl script that uses DBI to unload data from a database table to a specific format. I've got something working, but the performance is... lacking.

Here's the performance critical section of the code:

while (my $row = $query->fetchrow_arrayref()) {
    # Sanitize the columns to make sure certain characters are escaped with a backslash.
    # The escaping is required as some binary data may be included in some columns.
    # This must occur *before* the join() as $COLUMN_DELIM_STR may contain one of the special characters.
    for $col (@$row) { $col =~ s/(?=[\x5C\x00-\x1F])/\\/g; }

    # Output the sanitized row
    print join($COLUMN_DELIM_STR, @$row) . $RECORD_DELIM_STR;
}

I've got a test table with 5 columns and 10 million rows. The total unload time comes out to 90 seconds (the output is redirected to /dev/null so disk writes won't interfere with the benchmarks).

After experimenting with removing chunks of code to get an idea of how they affect performance, I came to the realization that the sanitization loop accounts for a large amount of processing in time, around 30 seconds (about 1/3 the total execution time). Settings DBI_PROFILE=4 shows that the fetching itself takes around 45 seconds.

Here's the kicker: removing the actual substitution step ($col =~ s/(?=[\x5C\x00-\x1F])/\\/g;) only saves about 12 seconds of processing time. That means that the do-nothing for loop (for $col (@$row) { ; }) incurs 18 seconds of overhead, more than the substitution itself. (This was verified by removing the loop entirely.)

Summary:

Question:

How do improve the performance of the sanitization step?
Bonus: Why is the for loop overhead to high?

Notes:

Upvotes: 0

Views: 258

Answers (2)

Schwern
Schwern

Reputation: 165198

If all you want to do is dump a table as a delimited file, let the database do it. MySQL has SELECT INTO other databases have similar facilities. This avoids the overhead of copying all the data into your program, alter it, and spit it out again.


Another option is to do the escaping in the SELECT. In Oracle you can use REGEXP_REPLACE. This should do it (I may have got the details on the backslashes wrong).

REGEXP_REPLACE(column, '([^[:print:]])', '\\\\1')

The problem now is doing it to every column. You don't know how many columns you have or their names, but you can find out easily enough with SELECT * FROM table LIMIT 1 and $sth->fetchrow_hashref or more directly with $dbh->column_info. Now you can construct a SELECT with the right number of rows and apply REGEXP_REPLACE to each of them. This may be faster. You could even do the join in the SELECT.

You could even write a PL/SQL function to do all this for you. This will likely be the most efficient. Here's an example of writing a string join function that can be adapted to also do the escaping.


As to why the empty loop is slow, you're running it 50 million times, though 18 seconds seems quite high. My 2011 Macbook Pro can run it in about 6 seconds, let's verify the empty loop is the problem. How long does this code take?

time perl -wle 'my $rows = [1..5]; for my $row (1..10_000_000) { for $col (@$rows) {} }'

Simply iterating 50 million times (for (1..50_000_000)) takes a third of the time. So maybe there's a way to micro-optimize the inner loop. I'll spare you, it turns out a map in void context with no block is significantly faster.

map s{(?=[\x5C\x00-\x1F])}{\\}g, @$rows;

Why? Dumping the bytecode with B::Terse tells us Perl is doing less work in the map. Here's what the inner for loop is doing:

    UNOP (0x1234567890ab) null 
        LOGOP (0x1234567890ab) and 
            OP (0x1234567890ab) iter 
            LISTOP (0x1234567890ab) lineseq 
                COP (0x1234567890ab) nextstate 
                BINOP (0x1234567890ab) leaveloop 
                    LOOP (0x1234567890ab) enteriter 
                        OP (0x1234567890ab) null [3] 
                        UNOP (0x1234567890ab) null [147] 
                            OP (0x1234567890ab) pushmark 
                            UNOP (0x1234567890ab) rv2av [7] 
                                OP (0x1234567890ab) padsv [1] 
                        PADOP (0x1234567890ab) gv  GV (0x1234567890ab) *_ 
                    UNOP (0x1234567890ab) null 
                        LOGOP (0x1234567890ab) and 
                            OP (0x1234567890ab) iter 
                            LISTOP (0x1234567890ab) lineseq 
                                COP (0x1234567890ab) nextstate 
                                PMOP (0x1234567890ab) subst 
                                    SVOP (0x1234567890ab) const [12] PV (0x1234567890ab) "2" 
                                OP (0x1234567890ab) unstack 
                OP (0x1234567890ab) unstack 

and here's the map.

    UNOP (0x1234567890ab) null 
        LOGOP (0x1234567890ab) and 
            OP (0x1234567890ab) iter 
            LISTOP (0x1234567890ab) lineseq 
                COP (0x1234567890ab) nextstate 
                LOGOP (0x1234567890ab) mapwhile [8] 
                    LISTOP (0x1234567890ab) mapstart 
                        OP (0x1234567890ab) pushmark 
                        UNOP (0x1234567890ab) null 
                            PMOP (0x1234567890ab) subst 
                                SVOP (0x1234567890ab) const [12] PV (0x1234567890ab) "2" 
                        UNOP (0x1234567890ab) rv2av [7] 
                            OP (0x1234567890ab) padsv [1] 
                OP (0x1234567890ab) unstack 

Basically, the for loop has to go through the extra work of setting up a new lexical context for every iteration. The map does not, but you can't use a block. Interestingly, s/1/2/ for @$rows compiles nearly the same as for (@$rows) { s/1/2/ }.

Upvotes: 5

ikegami
ikegami

Reputation: 386361

For me,

  • The test harness plus the substitution takes 3.57 µs per element (for seven-character strings with one character in need of escaping).
  • The test harness plus the loop takes 0.960 µs + 0.141 µs per element.

  • Looping over 5 elements thus becomes 1.66 µs

The numbers can vary in practice, but the ratio is much more in line with my expectations than what you claim. Performing a regex-based substitution is rather costly, but incrementing a counter is not, so the loop should be much cheaper than the substitution.


use strict;
use warnings;

use Benchmark qw( timethese );

my %tests = (
   'for'  => 'my $_col = our $col; our $row; for my $col (@$row) { }',
   's///' => 'my $_col = our $col; $_col =~ s/(?=[\\x5C\\x00-\\x1F])/\\\\/g;',
);

$_ = 'use strict; use warnings; '.$_ for values %tests;

{
   local our $row = [('a')x1000];
   local our $col = "abc\x00def";
   timethese(-3, \%tests);
}
{
   local our $row = [];
   local our $col = "abc\x00def";
   timethese(-3, \%tests);
}

Output:

  • for (1000 elements): 7065.42/s
  • for (0 elements): 1041030.65/s
  • s///: 284348.25/s

Upvotes: 1

Related Questions