JohnP
JohnP

Reputation: 65

How can I improve the performance of a Perl/DBI script that selects 1 million+ rows from one database server and inserts them on another?

I am using the latest versions of MySQL (workbench), Perl, and DBI.

I have two unique databases each on unique servers. I am querying table1 on DB1 on Server1, loading into an array, then with a while inserting said table into table2 on DB2 on Server2.

It is not a direct copy as I am actually doing a select on the table 1.

I am trying to speed up my script or if necessary change the queries.

I am using binds and have moved the prepare statements out of the loop and that has really helped. However, I still end up making 1.6 million calls to insert to table2/db2/server2, and that is using vast amounts of time (3100 seconds or about 2-3 ms per call).

There are two subs &insert2tempData which is used in the while loop to load my DB2 table2. It is called 1.6 million times AND query_dev_load_local_dbs which is used to query DB1 then call the second sub.

For simplicity I left this prepare in the loop, but in my code this is not in the while loop it is in the sub that calls this sub.

sub insert2tempData {
    $query4tmpData = $Locdbh_NetCool->prepare ("INSERT INTO kpincpolldata.kpitempData
        (monitoredObjectId, monitoredInstId, pollTime, tdwTime, errorCode, value)
        VALUES(?, ?, ?, ?, ?, ?)");

    $query4tmpData->execute($row[0], $row[1], $row[2], $row[3], $row[4],
                            $row[5]);
    warn "Problem in retrieving results", $query4tmpData->errstr(), "\n"
        if $query4tmpData->err();

} #End sun insert to tempData

This is the main sub used to query the server1, DB1, and table1. It calls the while loop

sub query_dev_load_local_dbs {
    $queryRemote = $Devdbh_NetCool->prepare("SELECT * FROM ncpolldata.pollData
                                   WHERE pollTime >= $STARTU AND pollTime < $ENDU
                                   AND monitoredObjectId = 1");

    $queryRemote->execute();

    while (@row = $queryRemote->fetchrow_array()) {

        #** Call to sub insert2tempData**
        &insert2tempData($Locdbh_NetCool);

        warn &print2log ("Problem in retrieving results"), $queryRemote->errstr(), "\n"
            if $queryRemote->err();
    } # End while

} # End sub query_dev_load_local_dbs

Upvotes: 4

Views: 2687

Answers (2)

Jon Ericson
Jon Ericson

Reputation: 21525

It's entirely possible the code you posted above has nothing to do with your perceived performance issue. It might take a long time to insert that many rows on the database server you are using. Here's what I'd look at:

  1. Turn AutoCommit off, it's enabled.

    Committing each time you insert can have serious performance implications on Server2 depending on how it's configured. It's entirely possible that batching up the entire process into one transaction could speed things up. But it really depends on the way the server is configured, how indexes are built, etc.

    Which brings us to:

  2. A competent DBA might be able to help.

    Check with whoever it is that manages your MySQL servers and see if they can help you troubleshoot. Who knows, they might even have DBA tools that will help solve your problem without you doing anything different in the code or even allow you to scrap this script altogether.

    Unless you are the DBA:

  3. You might poke around the Stack Exchange site for DBAs.

There might be some minor things to fix in the code you posted, but if the database is the bottleneck, it won't matter. Find the bottleneck!

Upvotes: 4

simbabque
simbabque

Reputation: 54373

I'd use fetchrow_arrayref() instead of fetchrow_array() because that way it doesn't have to make copies of all the return values each time you fetch a row. That should speed things up a bit. You might also take a look at bind_columns.

Then I didn't quite get what you meant with the while loop and the prepare, so I say that anyway: Move the $query4tmpData=$Locdbh_NetCool->prepare out of the sub and to somewhere else where it can be prepared once and then reuse it in the sub. You probably have that.


A few more things:

  • You should always use strict and use warnings. I don't see my in your subs. Please use those pragmas. They make your life easier.
  • The warning you issue when you cannot INSERT reads "Problem in retrieving results". That might confuse you if it occurs.

Upvotes: 4

Related Questions