Reputation: 25
i have a question on a performance processing sql in a nested forms in perl dbi.
bascially, what the script is doing is
connect to DB1
connect to DB2
fetch data from DBI
inside a while loop,
use data from DB1 and create an update or insert statement
execute the sql in DB2
disconenct from DB2
disconnect DB1
now the question is, is this the best practice when it comes to the performance or below method is the best alternative.
get date from DB1
write the data in a staging file
get data from the staging file
process the data to udpate in DB2.
Edit: i have added connect to DB2 after connect to DB1. but that is not the point i am trying to make. this is more on the question on the performance of I/O and system call that server is making to read from the database server and disk where the data is stored.
if you look at the first approach.
inside a loop. the script gets one line of data from DB1 and wait until the returned data get processed in DB2.
if there is thousands of records, what is the impact on the performance when you are making thousnds of system calls to the server to read the data from the disk
Upvotes: 2
Views: 628
Reputation: 5318
I would suggest doing that in parallel, e.g. using pipe()
and fork()
in your script. Otherwise most of the time the you'll be waiting for the network IO. So it's like
(SELECT, serialize, print) | (read line, deserialize, INSERT_OR_UPDATE)
Upvotes: 0
Reputation: 15284
if there is thousands of records, what is the impact on the performance when you are making thousnds of system calls to the server to read the data from the disk
Thousands of records, that's not much, given typical small or average record sizes. You don't know whether the data is actually read from the disk, or rather residing in buffers in memory. So you can't be sure of the number of system calls. And why would you bother?
If performance matters to you, I'd suggest you test both methods in a typical, real-life scenario.
If you're dealing with a locking database and a long-running process (for whatever reason), it might be preferable to get all the data you need at once (staging file method) so you don't hold the locks on the origin and the destination server for the whole duration of the processing.
Your choice might also depend on your familiarity with tools or APIs to do bulk import of data. For example, bcp.exe
for MSSQL is a handy tool for exporting and importing data. Most other databases have similar tools.
Upvotes: 1
Reputation: 504
I only have moderate experience with Perl, but that work dealt with Perl and databases. As I understand it, it is typically best to connect to databases outside of loops -- that way you minimize the overhead of connecting, disconnecting, and reconnecting repeatedly.
So...
connect to DB1
connect to DB2
fetch data from DB1
inside a while loop
use data from DB1 and create an update or insert statement
execute the sql in DB2
disconnect from DB1
disconnect from DB2
Upvotes: 2