Marconline
Marconline

Reputation: 1441

Performance in migrating a lot of data from the same DB to different DB using application BL

I've this problem. I've an application that, each 15 minutes, downloads a file from the net and bulkcopies it into a DB.

Users of my app subscribe to this file so they signal us that they want to "copy" this file into their databases. I wrote "copy" because I should apply some business logic to this Data before putting on their database. This business logic is customer dependent.

The problem is that the starting database contains something like changed 100.000 rows each 15 minutes (some are new records, some are updated and some are deleted).

How would you tackle this problem? I tried following the normal development process:

  1. foreach customer
  2. take new data --> apply business logic for that user --> put on his DB
  3. take upd data --> apply BL --> update his DB
  4. take del data --> apply BL --> remove from his DB

But it takes too much. Far far away from 15 minutes. Sometimes it takes hour for a single user.

What would you do?

Thanks, Marco

Upvotes: 0

Views: 113

Answers (2)

Panetta
Panetta

Reputation: 31

It's difficult to say without seeing the code but you could try profiling your code with something like Ants Performance Profiler to try to identify where the slow down is occurring. If you don't want to use that, I believe Visual Studio 2010 contains a profiling tool.

Obviously you would want to run your profiling against a debug or staging build rather than your production system.

If you believe that it's a database speed issue, you might want to look at how you're doing your inserts and whether any indexes or triggers are affecting DB insert speeds.

Upvotes: 0

Alexey Raga
Alexey Raga

Reputation: 7525

100.000 rows doesn't sound too much.

It depends on your business logic, but if it is some data transformation you can consider doing SSIS package (in terms of MS SQL Server, other RDBMS have other tools) to import your data.

You also can thing of taking advantage from parallelism, say, have several threads (or even virtual machines) working for you: just partition the file and process all the partitions simultaneously. Even implementing a simple map/reduce algorithm may help.

In any way, do some performance measurement, you really want to know WHY your data processing is SO slow. Probably something in your code can be optimized A LOT.

100.000 rows an hour is ridiculously slow, something wrong is going on there (unless you have a heavy and super complicated business logic you need to perform on each row of course)

Upvotes: 1

Related Questions