Orlan
Orlan

Reputation: 561

Java & MySQL Performance Issue

So I have a bit of a performance problem. I have made a java program that constructs a database. The problem is when loading in the data. I am loading in 5,000 files into a sql Database. When the program starts off, it can process about 10% of the files in 10 minutes however it gets much slower as it progresses. Currently at 28% it is going to finish in 16 hours at its current rate. However that rate is slowing down considerably.

My question is why does the program get progressively slower as it runs and how to fix that.

EDIT: I have two versions. One is threaded (capped at 5 threads) and one is not. The difference between the two is negligible. I can post the code again if any one likes, but I took it out because I am now fairly certain that the bottle neck is the MySQL (Also appropriately re tagged). I went ahead and used batch inserts. This did cause an initial increase in speed but once again after processing about 30% of the data it does drop of quickly.

So SQL Points

  1. My Engine for all 64 tables is InnoDB version 10.
  2. The table have about 300k rows at this point (~30% of the data)
  3. All tables have one "joint" primary key. A id and a date.
  4. Looking at MySQL WorkBench I see that there is a query per thread (5 queries)
  5. I am not sure the unit of time (Just reading from MySQL Administrator), but the queries to check if a file is already inserted are taking `300. (This query should be fast as it is a SELECT MyIndex from MyTable Limit 1 to 1 where Date = date.) As I have been starting and stopping the program I built in this check to see if the file was already inserted. That way I am able to start it after each change and see what if any improvement there is without starting the process again.
  6. I am fairly certain that the degradation of preformance is related to the tables' sizes. (I can stop and start the program now and the process remains slow. It is only when the tables are small that the process is going at an acceptable speed.)
  7. Please, please ask and I will post what ever information you need.

DONE! Well I just let it run for the 4 Days it needed to. Thank you all for the help.

Cheers,

--Orlan

Upvotes: 0

Views: 2737

Answers (5)

OldCurmudgeon
OldCurmudgeon

Reputation: 65793

You have several good tried and tested options for speeding up database access.

  1. Use an ExecutorService for your threads. This may not help speed-wise but it will help you implement the following.
  2. Hold a ThreadLocal Connection instead of making a new connection for every file. Also, obviously, don't close it.
  3. Create a single PreparedStatement instead of making a new one every time around.
  4. Batch up your statement executions.

Upvotes: 0

maasg
maasg

Reputation: 37435

Q1: Why does the program get progressively slower?

In your problem space, you have 2 systems interacting: a producer that reads from the file system and produces data, and a consumer that transforms that data into records and stores them in the db. Your code is currently hard linking these two processes and your system works at the slowest speed of the two.

In your program you have a fixed arrival rate (1/sec - the wait when you've more than 10 threads running). If you have indexes in the tables being filled, as the table grows bigger, inserts will take longer. That means that while your arrival rate is fixed at 1/sec, your exit rate is continuosly increasing. Therefore, you will be creating more and more threads that share the same CPU/IO resources and getting less things done per unit of time. Creating threads is also a very expensive operation.

Q2: Could it have to do with how I am constructing the queries from Strings?

Only partially. Your string manipulation is a fixed cost in the system. It increases the cost it takes to service one request. But string operations are CPU bounded and your problem is I/O bounded, meaning that improving the string handling (that you should) will only marginally improve the performance of the system. (See Amdahl's Law).

Q3: how to fix that (performance issue)

(FileReaderProducer) --> queue --> (DBBulkInsertConsumer)

  • Don't create new Threads. Use the facilities provided by the java.util.concurrent package, like the executor service or the Completion service mentioned above. For a "bare" threadpool, use the Executors factory.

  • For this specific proble, having 2 separate thread pools, (one for the consumer, one for the producer) will allow you to tune your system for best performance. File reading improves with parallelization (up to your I/O bound), but db inserts are not (I/O + indexes + relational consistency checks), so you might need to limit the amount of file reading threads (3-5) to match the insertion rate (2-3). You can monitor the queue size to evaluate your system performance.

  • Use JDBC bulk inserts: http://viralpatel.net/blogs/batch-insert-in-java-jdbc/
  • Use StringBuilder instead of String concatenation. Strings in Java are immutable. That means that every time you do: myString += ","; you are creating a new String and making the old String elegible for garbage collection. In turn, this increases garbage collection performance penalties.

Upvotes: 1

Peter Lawrey
Peter Lawrey

Reputation: 533442

Most databases load data more efficiently if,

  • you load in batches of data,
  • you load in a relatively small numebr of threads e.g. one or two.

As you add more threads you add more overhead, so you expect it to be slower.

Try using an ExecutorService with a fixed size pool e.g. 2-4 and try loading the data in batches of say 100 at a time in a transaction.

Upvotes: 0

alexey28
alexey28

Reputation: 5210

You can use direct insert from file to database (read here). It works faster. When I do same for postgres I get 20 times performance increase.

And also dounload Your kit profiler and profile your application for performance. Than you will see what takes your time.

Upvotes: 1

Ewald
Ewald

Reputation: 5751

There's a number of things in your code that could contribute to the speed problems and you are correct in suspecting that the Strings play a role.

Take for example this code:

String rowsString = ""; // - an extra 1 to not have a comma at the end for (int i = 0; i <= numberOfRows - 3; i++) { rowsString += "(DATA), \n"; } rowsString += "(DATA)";

Depending on how many rows there are, this is a potential bottle-neck and memory hog. I think it's best if you use a StringBuilder here. I see a lot of String manipulation that are better suited to StringBuilders. Might I suggest you read up on String handling a bit and optimise these, especially where you += Strings?

Then the next question is how is your table designed? There could be things making your inserts slow, like incorrect default lengths for varchars, no indexes or too many indexes etc.

Upvotes: 0

Related Questions