coolaiddrinker
coolaiddrinker

Reputation: 25

Perl DBI Performance on nested sql

i have a question on a performance processing sql in a nested forms in perl dbi.

bascially, what the script is doing is

connect to DB1

connect to DB2

fetch data from DBI

inside a while loop,

use data from DB1 and create an update or insert statement

execute the sql in DB2

disconenct from DB2

disconnect DB1

now the question is, is this the best practice when it comes to the performance or below method is the best alternative.

get date from DB1

write the data in a staging file

get data from the staging file

  process the data to udpate in DB2. 

Edit: i have added connect to DB2 after connect to DB1. but that is not the point i am trying to make. this is more on the question on the performance of I/O and system call that server is making to read from the database server and disk where the data is stored.

if you look at the first approach.

inside a loop. the script gets one line of data from DB1 and wait until the returned data get processed in DB2.

if there is thousands of records, what is the impact on the performance when you are making thousnds of system calls to the server to read the data from the disk

Upvotes: 2

Views: 628

Answers (3)

Dallaylaen
Dallaylaen

Reputation: 5318

I would suggest doing that in parallel, e.g. using pipe() and fork() in your script. Otherwise most of the time the you'll be waiting for the network IO. So it's like

(SELECT, serialize, print) | (read line, deserialize, INSERT_OR_UPDATE)

Upvotes: 0

Lumi
Lumi

Reputation: 15284

if there is thousands of records, what is the impact on the performance when you are making thousnds of system calls to the server to read the data from the disk

Thousands of records, that's not much, given typical small or average record sizes. You don't know whether the data is actually read from the disk, or rather residing in buffers in memory. So you can't be sure of the number of system calls. And why would you bother?

If performance matters to you, I'd suggest you test both methods in a typical, real-life scenario.

If you're dealing with a locking database and a long-running process (for whatever reason), it might be preferable to get all the data you need at once (staging file method) so you don't hold the locks on the origin and the destination server for the whole duration of the processing.

Your choice might also depend on your familiarity with tools or APIs to do bulk import of data. For example, bcp.exe for MSSQL is a handy tool for exporting and importing data. Most other databases have similar tools.

Upvotes: 1

indiguy
indiguy

Reputation: 504

I only have moderate experience with Perl, but that work dealt with Perl and databases. As I understand it, it is typically best to connect to databases outside of loops -- that way you minimize the overhead of connecting, disconnecting, and reconnecting repeatedly.

So...

connect to DB1

connect to DB2

fetch data from DB1

inside a while loop

use data from DB1 and create an update or insert statement

execute the sql in DB2

disconnect from DB1

disconnect from DB2

Upvotes: 2

Related Questions