Reputation: 561
So I have a bit of a performance problem. I have made a java program that constructs a database. The problem is when loading in the data. I am loading in 5,000 files into a sql Database. When the program starts off, it can process about 10% of the files in 10 minutes however it gets much slower as it progresses. Currently at 28% it is going to finish in 16 hours at its current rate. However that rate is slowing down considerably.
My question is why does the program get progressively slower as it runs and how to fix that.
EDIT: I have two versions. One is threaded (capped at 5 threads) and one is not. The difference between the two is negligible. I can post the code again if any one likes, but I took it out because I am now fairly certain that the bottle neck is the MySQL (Also appropriately re tagged). I went ahead and used batch inserts. This did cause an initial increase in speed but once again after processing about 30% of the data it does drop of quickly.
So SQL Points
DONE! Well I just let it run for the 4 Days it needed to. Thank you all for the help.
Cheers,
--Orlan
Upvotes: 0
Views: 2737
Reputation: 65793
You have several good tried and tested options for speeding up database access.
ExecutorService
for your threads. This may not help speed-wise but it will help you implement the following.ThreadLocal
Connection
instead of making a new connection for every file. Also, obviously, don't close it.PreparedStatement
instead of making a new one every time around.Upvotes: 0
Reputation: 37435
Q1: Why does the program get progressively slower?
In your problem space, you have 2 systems interacting: a producer that reads from the file system and produces data, and a consumer that transforms that data into records and stores them in the db. Your code is currently hard linking these two processes and your system works at the slowest speed of the two.
In your program you have a fixed arrival rate (1/sec - the wait when you've more than 10 threads running). If you have indexes in the tables being filled, as the table grows bigger, inserts will take longer. That means that while your arrival rate is fixed at 1/sec, your exit rate is continuosly increasing. Therefore, you will be creating more and more threads that share the same CPU/IO resources and getting less things done per unit of time. Creating threads is also a very expensive operation.
Q2: Could it have to do with how I am constructing the queries from Strings?
Only partially. Your string manipulation is a fixed cost in the system. It increases the cost it takes to service one request. But string operations are CPU bounded and your problem is I/O bounded, meaning that improving the string handling (that you should) will only marginally improve the performance of the system. (See Amdahl's Law).
Q3: how to fix that (performance issue)
(FileReaderProducer) --> queue --> (DBBulkInsertConsumer)
Don't create new Threads. Use the facilities provided by the java.util.concurrent package, like the executor service or the Completion service mentioned above. For a "bare" threadpool, use the Executors factory.
For this specific proble, having 2 separate thread pools, (one for the consumer, one for the producer) will allow you to tune your system for best performance. File reading improves with parallelization (up to your I/O bound), but db inserts are not (I/O + indexes + relational consistency checks), so you might need to limit the amount of file reading threads (3-5) to match the insertion rate (2-3). You can monitor the queue size to evaluate your system performance.
myString += ","
; you are creating a new String and making the old String elegible for garbage collection. In turn, this increases garbage collection performance penalties.Upvotes: 1
Reputation: 533442
Most databases load data more efficiently if,
As you add more threads you add more overhead, so you expect it to be slower.
Try using an ExecutorService with a fixed size pool e.g. 2-4 and try loading the data in batches of say 100 at a time in a transaction.
Upvotes: 0
Reputation: 5210
You can use direct insert from file to database (read here). It works faster. When I do same for postgres I get 20 times performance increase.
And also dounload Your kit profiler and profile your application for performance. Than you will see what takes your time.
Upvotes: 1
Reputation: 5751
There's a number of things in your code that could contribute to the speed problems and you are correct in suspecting that the Strings play a role.
Take for example this code:
String rowsString = ""; // - an extra 1 to not have a comma at the end for (int i = 0; i <= numberOfRows - 3; i++) { rowsString += "(DATA), \n"; } rowsString += "(DATA)";
Depending on how many rows there are, this is a potential bottle-neck and memory hog. I think it's best if you use a StringBuilder here. I see a lot of String manipulation that are better suited to StringBuilders. Might I suggest you read up on String handling a bit and optimise these, especially where you += Strings?
Then the next question is how is your table designed? There could be things making your inserts slow, like incorrect default lengths for varchars, no indexes or too many indexes etc.
Upvotes: 0