Reputation: 1441
I've this problem. I've an application that, each 15 minutes, downloads a file from the net and bulkcopies it into a DB.
Users of my app subscribe to this file so they signal us that they want to "copy" this file into their databases. I wrote "copy" because I should apply some business logic to this Data before putting on their database. This business logic is customer dependent.
The problem is that the starting database contains something like changed 100.000 rows each 15 minutes (some are new records, some are updated and some are deleted).
How would you tackle this problem? I tried following the normal development process:
But it takes too much. Far far away from 15 minutes. Sometimes it takes hour for a single user.
What would you do?
Thanks, Marco
Upvotes: 0
Views: 113
Reputation: 31
It's difficult to say without seeing the code but you could try profiling your code with something like Ants Performance Profiler to try to identify where the slow down is occurring. If you don't want to use that, I believe Visual Studio 2010 contains a profiling tool.
Obviously you would want to run your profiling against a debug or staging build rather than your production system.
If you believe that it's a database speed issue, you might want to look at how you're doing your inserts and whether any indexes or triggers are affecting DB insert speeds.
Upvotes: 0
Reputation: 7525
100.000 rows doesn't sound too much.
It depends on your business logic, but if it is some data transformation you can consider doing SSIS package (in terms of MS SQL Server, other RDBMS have other tools) to import your data.
You also can thing of taking advantage from parallelism, say, have several threads (or even virtual machines) working for you: just partition the file and process all the partitions simultaneously. Even implementing a simple map/reduce algorithm may help.
In any way, do some performance measurement, you really want to know WHY your data processing is SO slow. Probably something in your code can be optimized A LOT.
100.000 rows an hour is ridiculously slow, something wrong is going on there (unless you have a heavy and super complicated business logic you need to perform on each row of course)
Upvotes: 1