Reputation: 619
We have recently implemented DB binding using Hibernate with EntityManager (no Spring) to write records to a database. To make things simple, I will only discuss the variation of the process that only does inserts. (Another, very similar, process updates an existing record once to set a status, but otherwise, just inserts a bunch of records.)
This process can insert as many as 10,000 records per transaction, though the average is less than that, probably by at least half. We might have a few instances of this process running in different threads at the same time under the same JVM.
We had a production issue where the service the process was running under was pegging out all 24 cores on the machine. (They added 12 just to try to accommodate this.) We have narrowed this high utilization down to Hibernate.
I have spent days researching and can not find anything that might improve our performance, except to use hibernate.jdbc.batch_size along with hibernate.order_inserts. Unfortunately, we are using IDENTITY as our generation strategy, so Hibernate can/will not batch those inserts.
I've spent days researching and have not found any other performance tip when doing large numbers of inserts. (I've seen many tips regarding reads, updates, and deletes, but very few for inserts.)
We have a root JobPO object. We simply call merge on that and all the inserts are handled via the cascade annotations. We need to do this in a single transaction.
We have only 8 different tables that we are inserting into, but the hierarchy of records is a bit complicated.
public void saveOrUpdate(Object dataHierarchyRoot) {
final EntityManager entityManager = entityManagerFactory.createEntityManager();
final EntityTransaction transaction = entityManager.getTransaction();
try {
transaction.begin();
// This single call may result in inserting up to 10K records
entityManager.merge(dataHierarchyRoot);
transaction.commit();
} catch (final Throwable e) {
// error handling redacted for brevity
} finally {
entityManager.close();
}
}
We create the EntityManagerFactory only once.
Any ideas?
Additional notes:
No one has complained about the process using too much memory
For the variation of the process that is doing only inserts, we could just use "persist" instead of "merge". We are sharing the code, so we do a merge. I tried switching to persist with no discernible improvement.
We do have annotations that result in bi-directional Cascade on a few of the fields. I tried removing these, but being new to Hibernate, couldn't get it to save properly. From what I understand, though, it doesn't seem as if this would cause performance degradation for inserts. I am not using explicit "inverse" settings, since that seems to not matter for inserts, either. I am a little shaky on both these areas, though. Is there room for improvement in this area?
We ran SQL Profiler during a single transaction. Nothing seemed amiss and I did not spot room for improvement. (There were a large number of exec sp_prepexec statements, approximately the same number as the number of records that were inserted. That was all that was reported.)
The code exhibiting this behavior in production was making an explicit call to entityManager.flush() just before the commit(). I removed that code in my local environment. It did not make a noticeable improvement, but I won't add it back, because there is no reason for us to call flush().
Upvotes: 2
Views: 5354
Reputation: 153690
If you open and close a Session for every object you are about to save, then for 10k objects you are practically opening and closing 10k Sessions, flushing 10k times and going to the database for 10k round-trips.
You should at least batch multiple entities together:
for (Object entity: entities) {
if(entity.getId() == null) {
entityManager.persist(entity);
} else {
entityManager.merge(entity);
}
if ((i % batchSize) == 0) {
entityManager.getTransaction().commit();
entityManager.clear();
entityManager.getTransaction().begin();
}
}
entityManager.getTransaction().commit();
em.getTransaction().commit();
In this example you are practically using one database connection, so even if you use connection pooling you won't have to acquire/release 10k DB connections. The Session is cleared after the batchSize
threshold is reached, therefore reducing the JVM garbage collection.
If you store 10k entities in a session and commit the transaction at once you will run into the following issues:
Upvotes: 5
Reputation: 619
The solution (or at least a way to greatly decrease CPU usage) was to switch from merge to persist. I had mentioned in my post that I had tried switching to persist with no discernible difference.
I have subsequently found a better way to profile heavy loads and with that was able to show the improvement. Switching from persist to merge for the particular load I was running reduced average CPU percentage from 16 to 5.
We do not need to merge. For the permanent fix, we need to rework the code a little to be able to use the same EntityManager for loading the root object and then persisting it (which will then cascade the full "tree" structure). This way, our object doesn't become detached and we don't need to use merge.
Thanks to ddalton for pointing in that direction.
Upvotes: 0
Reputation: 1054
Well, You should avoid opening and closing connections on each update because it hurts performance. Instead, You could configure your persistence provider to use batch processing and set a reasonable number, and then perform a batch update.
<persistence-unit name="pu" transaction-type="RESOURCE_LOCAL">
<provider>org.hibernate.ejb.HibernatePersistence</provider>
<properties>
<property name="hibernate.dialect" value="org.hibernate.dialect.OracleDialect"/>
<property name="hibernate.connection.username" value="***"/>
<property name="hibernate.connection.password" value="***"/>
<property name="hibernate.connection.driver_class" value="oracle.jdbc.OracleDriver"/>
<property name="hibernate.connection.url" value="jdbc:oracle:thin:@***"/>
<property name="hibernate.jdbc.batch_size" value="100"/>
</properties>
</persistence-unit>
This allows sending multiple update queries in a single command to the database when your are updating/inserting into a loop (it is transparent to you).
Session session = SessionFactory.openSession();
Transaction tx = session.beginTransaction();
for ( int i=0; i<100000; i++ ) {
Employee employee = new Employee(.....);
session.save(employee);
}
tx.commit();
session.close();
References: http://www.tutorialspoint.com/hibernate/hibernate_batch_processing.htm
Upvotes: 0