Martin V.
Martin V.

Reputation: 3710

Spring,Hibernate - Batch processing of large amounts of data with good performance

Imagine you have large amount of data in database approx. ~100Mb. We need to process all data somehow (update or export to somewhere else). How to implement this task with good performance ? How to setup transaction propagation ?

Example 1# (with bad performance) :

@Singleton
public ServiceBean {

 procesAllData(){

   List<Entity> entityList = dao.findAll();

   for(...){
     process(entity);
   }

 }

 private void process(Entity ent){
  //data processing    
  //saves data back (UPDATE operation) or exports to somewhere else (just READs from DB)
 }

}

What could be improved here ?

In my opinion :

  1. I would set hibernate batch size (see hibernate documentation for batch processing).
  2. I would separated ServiceBean into two Spring beans with different transactions settings. Method processAllData() should run out of transaction, because it operates with large amounts of data and potentional rollback wouldnt be 'quick' (i guess). Method process(Entity entity) would run in transaction - no big thing to make rollback in the case of one data entity.

Do you agree ? Any tips ?

Upvotes: 2

Views: 8446

Answers (2)

atrain
atrain

Reputation: 9255

Here are 2 basic strategies:

  1. JDBC batching: set the JDBC batch size, usually somewhere between 20 and 50 (hibernate.jdbc.batch_size). If you are mixing and matching object C/U/D operations, make sure you have Hibernate configured to order inserts and updates, otherwise it won't batch (hibernate.order_inserts and hibernate.order_updates). And when doing batching, it is imperative to make sure you clear() your Session so that you don't run into memory issues during a large transaction.
  2. Concatenated SQL statements: implement the Hibernate Work interface and use your implementation class (or anonymous inner class) to run native SQL against the JDBC connection. Concatenate hand-coded SQL via semicolons (works in most DBs) and then process that SQL via doWork. This strategy allows you to use the Hibernate transaction coordinator while being able to harness the full power of native SQL.

You will generally find that no matter how fast you can get your OO code, using DB tricks like concatenating SQL statements will be faster.

Upvotes: 2

Alex Barnes
Alex Barnes

Reputation: 7218

There are a few things to keep in mind here:

  1. Loading all entites into memory with a findAll method can lead to OOM exceptions.

  2. You need to avoid attaching all of the entities to a session - since everytime hibernate executes a flush it will need to dirty check every attached entity. This will quickly grind your processing to a halt.

Hibernate provides a stateless session which you can use with a scrollable results set to scroll through entities one by one - docs here. You can then use this session to update the entity without ever attaching it to a session.

The other alternative is to use a stateful session but clear the session at regular intervals as shown here.

I hope this is useful advice.

Upvotes: 1

Related Questions