Marnee
Marnee

Reputation: 619

High CPU utilization when doing thousands of inserts with Hibernate

We have recently implemented DB binding using Hibernate with EntityManager (no Spring) to write records to a database. To make things simple, I will only discuss the variation of the process that only does inserts. (Another, very similar, process updates an existing record once to set a status, but otherwise, just inserts a bunch of records.)

This process can insert as many as 10,000 records per transaction, though the average is less than that, probably by at least half. We might have a few instances of this process running in different threads at the same time under the same JVM.

We had a production issue where the service the process was running under was pegging out all 24 cores on the machine. (They added 12 just to try to accommodate this.) We have narrowed this high utilization down to Hibernate.

I have spent days researching and can not find anything that might improve our performance, except to use hibernate.jdbc.batch_size along with hibernate.order_inserts. Unfortunately, we are using IDENTITY as our generation strategy, so Hibernate can/will not batch those inserts.

I've spent days researching and have not found any other performance tip when doing large numbers of inserts. (I've seen many tips regarding reads, updates, and deletes, but very few for inserts.)

We have a root JobPO object. We simply call merge on that and all the inserts are handled via the cascade annotations. We need to do this in a single transaction.

We have only 8 different tables that we are inserting into, but the hierarchy of records is a bit complicated.

public void saveOrUpdate(Object dataHierarchyRoot) {
    final EntityManager entityManager = entityManagerFactory.createEntityManager();
    final EntityTransaction transaction = entityManager.getTransaction();

    try {
        transaction.begin();

        // This single call may result in inserting up to 10K records
        entityManager.merge(dataHierarchyRoot);
        transaction.commit();
    } catch (final Throwable e) {
        // error handling redacted for brevity
    } finally {
        entityManager.close();
    }
}

We create the EntityManagerFactory only once.

Any ideas?

Additional notes:

Upvotes: 2

Views: 5354

Answers (3)

Vlad Mihalcea
Vlad Mihalcea

Reputation: 153690

If you open and close a Session for every object you are about to save, then for 10k objects you are practically opening and closing 10k Sessions, flushing 10k times and going to the database for 10k round-trips.

You should at least batch multiple entities together:

for (Object entity: entities) {    
    if(entity.getId() == null) {
        entityManager.persist(entity);
    } else {
        entityManager.merge(entity);
    }   
    if ((i % batchSize) == 0) {
        entityManager.getTransaction().commit();
        entityManager.clear();          
        entityManager.getTransaction().begin();
    }
}
entityManager.getTransaction().commit();
em.getTransaction().commit();

In this example you are practically using one database connection, so even if you use connection pooling you won't have to acquire/release 10k DB connections. The Session is cleared after the batchSize threshold is reached, therefore reducing the JVM garbage collection.

If you store 10k entities in a session and commit the transaction at once you will run into the following issues:

  • the database will hold locks for a longer period of time and will create huge undo transaction logs (in case your database uses MVCC)
  • the entities won't be garbage collected, as they are still attached to the Hibernate Session

Upvotes: 5

Marnee
Marnee

Reputation: 619

The solution (or at least a way to greatly decrease CPU usage) was to switch from merge to persist. I had mentioned in my post that I had tried switching to persist with no discernible difference.

I have subsequently found a better way to profile heavy loads and with that was able to show the improvement. Switching from persist to merge for the particular load I was running reduced average CPU percentage from 16 to 5.

We do not need to merge. For the permanent fix, we need to rework the code a little to be able to use the same EntityManager for loading the root object and then persisting it (which will then cascade the full "tree" structure). This way, our object doesn't become detached and we don't need to use merge.

Thanks to ddalton for pointing in that direction.

Upvotes: 0

Javier Sánchez
Javier Sánchez

Reputation: 1054

Well, You should avoid opening and closing connections on each update because it hurts performance. Instead, You could configure your persistence provider to use batch processing and set a reasonable number, and then perform a batch update.

<persistence-unit name="pu" transaction-type="RESOURCE_LOCAL">
    <provider>org.hibernate.ejb.HibernatePersistence</provider>
        <properties>
            <property name="hibernate.dialect" value="org.hibernate.dialect.OracleDialect"/>
            <property name="hibernate.connection.username" value="***"/>
            <property name="hibernate.connection.password" value="***"/>
            <property name="hibernate.connection.driver_class" value="oracle.jdbc.OracleDriver"/>
            <property name="hibernate.connection.url" value="jdbc:oracle:thin:@***"/>
            <property name="hibernate.jdbc.batch_size" value="100"/>   
        </properties>
    </persistence-unit>

This allows sending multiple update queries in a single command to the database when your are updating/inserting into a loop (it is transparent to you).

Session session = SessionFactory.openSession();
Transaction tx = session.beginTransaction();
for ( int i=0; i<100000; i++ ) {
    Employee employee = new Employee(.....);
    session.save(employee);
}
tx.commit();
session.close();

References: http://www.tutorialspoint.com/hibernate/hibernate_batch_processing.htm

Upvotes: 0

Related Questions