Dhruv
Dhruv

Reputation: 10703

Bulk/Batch update using Spring Data JPA/Hibernate on Mysql

I am using Mysql, Spring Data JPA. In my use case, I have only 1 table e.g. Customer (ID, FIRST_NAME, LAST_NAME) What I am trying to achieve is to update in a batch/bulk where update statements are a group as shown above in the example to reduce database round trips.

I have set all properties

But the result is (update statements are not getting grouped): logs from MySQL general logs

2018-10-28T03:18:32.545233Z 1711 Query update CUSTOMER set FIRST_NAME=’499997′, LAST_NAME=’499998′ where id=499996;
2018-10-28T03:18:32.545488Z 1711 Query update CUSTOMER set FIRST_NAME=’499998′, LAST_NAME=’499999′ where id=499997;
2018-10-28T03:18:32.545809Z 1711 Query update CUSTOMER set FIRST_NAME=’499999′, LAST_NAME=’500000′ where id=499998;

Desired Result: (updates are grouped as a single query, thus reducing the DB roundtrips)

2018-10-28T03:18:32.545233Z 1711 Query update CUSTOMER set FIRST_NAME=’499997′, LAST_NAME=’499998′ where id=499996; update CUSTOMER set FIRST_NAME=’499998′, LAST_NAME=’499999′ where id=499997; update CUSTOMER set FIRST_NAME=’499999′, LAST_NAME=’500000′ where id=499998;

My application needs to perform more then 100 million update and I suppose this is could the fastest way.

Upvotes: 1

Views: 3248

Answers (1)

paul.anasuya
paul.anasuya

Reputation: 5

I will suggest you to setup hibernate.jdbc.batch_size property as well.Following is a small example I have tried :

int entityCount = 50;
int batchSize = 25;

EntityManager entityManager = entityManagerFactory()
    .createEntityManager();

EntityTransaction entityTransaction = entityManager
    .getTransaction();

try {
    entityTransaction.begin();

    for (int i = 0; i < entityCount; i++) {
        if (i > 0 && i % batchSize == 0) {
            entityTransaction.commit();
            entityTransaction.begin();

            entityManager.clear();
        }

        Post post = new Post(
            String.format("Post %d", i + 1)
        );

        entityManager.persist(post);
    } 

    entityTransaction.commit();
} catch (RuntimeException e) {
    if (entityTransaction.isActive()) {
        entityTransaction.rollback();
    }
    throw e;
} finally {
    entityManager.close();
}

Every time iteration counter (e.g. i) has reached a multiple of the batchSize threshold value, we can flush the EntityManager and commit the database transaction. By committing the database transaction after every batch execution, we gain the following advantages:

  • We avoid long-running transactions which are detrimental to MVCC relational database systems.
  • We make sure that if we have a failure, we don’t lose the work done by the batch jobs that have previously executed successfully.

The EntityManager is cleared after every batch execution so that we don’t keep on accumulating managed entities which can cause several problems:

  • If the number of entities to be persisted is huge, we risk of running out of memory.
  • The more entities we accumulate in the Persistence Context, the slower the flush becomes. So, it’s good practice to make sure the Persistence Context as slim as possible.

If an exception is thrown, we must make sure to roll back the current running database transaction. Failing to do so, can cause many problems since the database might still think that the transaction is open and locks might be held until the transaction is ended by a timeout or by the DBA.

In the end, we need to close the EntityManager so that we can clear the context and deallocate Session-level resources.

Upvotes: -3

Related Questions