Tiny
Tiny

Reputation: 27899

Bulk delete in JPA 2.1 using the criteria API

Given the following code that deletes rows in a batch.

List<City> list = ...
int i=0;

for(City city:list)
{
    if(++i%49==0)
    {
        entityManager.flush();
    }

    entityManager.remove(city);
}

The JPA 2.1 criteria API provides CriteriaDelete to perform a bulk delete. Accordingly, the following code executes a DELETE FROM city WHERE id IN(...) query.

CriteriaBuilder criteriaBuilder=entityManager.getCriteriaBuilder();
CriteriaDelete<City> criteriaDelete = criteriaBuilder.createCriteriaDelete(City.class);

Root<City> root = criteriaDelete.from(City.class);
criteriaDelete.where(root.in(list));
entityManager.createQuery(criteriaDelete).executeUpdate();

But this should not be equivalent of the first case. What is the equivalent of the first case? It should perform deletion in a batch.

Upvotes: 2

Views: 8519

Answers (2)

Vlad Mihalcea
Vlad Mihalcea

Reputation: 153850

In your case, it might be that the Criteria API 2.1 Bulk Delete performs better:

CriteriaBuilder criteriaBuilder=entityManager.getCriteriaBuilder();
CriteriaDelete<City> criteriaDelete = criteriaBuilder.createCriteriaDelete(City.class);

Root<City> root = criteriaDelete.from(City.class);
criteriaDelete.where(root.in(list));
entityManager.createQuery(criteriaDelete).executeUpdate();

For batch delete, you don't need Criteria API at all. In fact, you can leave the code untouched:

List<City> list = ...
int i=0;

for(City city:list {
    if(++i%49==0) {
        entityManager.flush();
    }
    entityManager.remove(city);
}

What you need to do is to enable JDBC batch updates like this:

<property name="hibernate.jdbc.batch_size" value="50"/>

However, the entity-level batching example needs some improvement. As I explained in this article, the best way to do batch processing with JPA and Hibernate if you process hundreds of entities, it is a good idea to commit the transaction as well during the batch:

int entityCount = 50;
int batchSize = 25;

EntityManager entityManager = null;
EntityTransaction transaction = null;

try {
    entityManager = entityManagerFactory()
        .createEntityManager();

    transaction = entityManager.getTransaction();
    transaction.begin();

    for ( int i = 0; i < entityCount; ++i ) {
        if ( i > 0 && i % batchSize == 0 ) {
            entityManager.flush();
            entityManager.clear();

            transaction.commit();
            transaction.begin();
        }

        Post post = new Post( 
            String.format( "Post %d", i + 1 ) 
        );
        entityManager.persist( post );
    }

    transaction.commit();
} catch (RuntimeException e) {
    if ( transaction != null && 
         transaction.isActive()) {
        transaction.rollback();
    }
    throw e;
} finally {
    if (entityManager != null) {
        entityManager.close();
    }
}

This way, you will avoid a long-running transaction that can hurt performance in both 2PL and MVCC-based relational databases.

More, consider that you need to update 10K entries and delete 3k rows, would you really want to roll back everything just because the last SQL statement failed?

The atomicity in ACID is great for OLTP where you only touch a small subset of data. For OLAP, or bulk processing, it's better to use bulk updates and deletes instead.

Upvotes: 1

Chris
Chris

Reputation: 21145

The equivalent would be to execute a query to delete a single entity in a for loop, something like:

for(City city:list)
{
    if(++i%49==0)
    {
        entityManager.flush();
    }

    CriteriaBuilder criteriaBuilder=entityManager.getCriteriaBuilder();
    CriteriaDelete<City> criteriaDelete = criteriaBuilder.createCriteriaDelete(City.class);

    Root<City> root = criteriaDelete.from(City.class);
    criteriaDelete.where(root.equal(city));
    entityManager.createQuery(criteriaDelete).executeUpdate();
}

Each of these 50 statements might not get put into the same batch - it depends on if your JPA provider supports batching or not. The loop seems less efficient, as you end up with X statements rather than a single DELETE FROM city WHERE id IN(...) you did with the original bulk delete.

Upvotes: 1

Related Questions