n0obiscuitz
n0obiscuitz

Reputation: 573

Is JDBC multi-threaded insert possible?

I'm currently working on a Java project which i need to prepare a big(to me) mysql database. I have to do web scraping using Jsoup and store the results into my database as well. As i estimated, i will have roughly 1,500,000 to 2,000,000 records to be inserted. In my first trial, i just use a loop to insert these records and it takes me one week to insert about 1/3 of my required records, which is too slow i think. Is it possible to make this process multi-threaded, so that i can have my records split into 3 sets, say 500,000 records per set, and then insert them into one database( one table specifically)?

Upvotes: 6

Views: 6643

Answers (5)

duffymo
duffymo

Reputation: 308743

You can chunk your record set into batches and do this, but perhaps you should think about other factors as well.

Are you doing a network round trip for each INSERT? If yes, latency could be the real enemy. Try batching those requests to cut down on network traffic.

Do you have transactions turned on? If yes, the size of the rollback log could be the problem.

I'd recommend profiling the app server and the database server to see where the time is being spent. You can waste a lot of time guessing about the root cause.

Upvotes: 1

skaffman
skaffman

Reputation: 403441

Multi-threading isn't going to help you here. You'll just move the contention bottleneck from your app server to the database.

Instead, try using batch-inserts instead, they generally make this sort of thing orders of magnitude faster. See "3.4 Making Batch Updates" in the JDBC tutorial.

Edit: As @Jon commented, you need to decouple the fetching of the web pages from their insertion into the database, otherwise the whole process will go at the speed of the slowest operation. You could have multiple threads fetching web pages, which add the data to a queue data structure, and then have a single thread draining the queue into the database using a batch insert.

Upvotes: 7

Sam
Sam

Reputation: 6890

I think multi thread approch usefull for your issue but you have to using a connection pool such as C3P0 or Tomca 7 Connetcion pool for more performance.

Another solution is using a batch-operation provider such as Spring-batch, exist anothers utility for batch operation also.

Another solution is using a PL/SQl Procedure with a input structure parameter.

Upvotes: 0

Jugal Shah
Jugal Shah

Reputation: 3701

You can insert these records in different threads provided they do use different primary key values.

You should also look at Spring Batch which I believe will be useful in your case.

Upvotes: 1

Jon Martin Solaas
Jon Martin Solaas

Reputation: 767

Just make sure two (or more) threads doesn't use the same connection at the same time, using a connection pool resolves that. c3po and apache dbcp comes to mind ...

Upvotes: 1

Related Questions