Reputation: 81
I have got a couple of customers which wants to update their data using CSV files, the typical customer's file has got 1.000 up to 20.000 rows.
After parsing the rows I merge new new data into the existing entities. All entities which has been modified gets saved using a batch save:
public void batchSaveOrUpdate(List<Item> items) {
Transaction transaction = session.beginTransaction();
List<Item> itemsToEvict = new ArrayList<>();
int count = 0;
for (Item item : itemss) {
session.saveOrUpdate(item);
itemsToEvict.add(item);
if (count++ % 25 == 0) {
session.flush();
itemsToEvict.forEach(session::evict);
itemsToEvict.clear();
}
}
transaction.commit();
session.flush();
}
It works well for normal imports up to 25.000 rows, but now I got a customer which CSV file contains up to 600.000 rows. Even after determining the items which has been modified there are 100.000+ entities which has be updated at once. Since all transactions gets reaped by the WildFly after a while, all imports by the large customer fails. I have already increased the transaction timeout to one hour and further increasing is no solution for me.
What can I do to increase the hibernate update prerformance at this point?
Upvotes: 3
Views: 322
Reputation: 4305
First of all CopyManager is most efficient way copy CVS to postgres. May be you should create some temp table, then copy CSV and then perform a set of insert / update SQL statements. It may be tricky to get PGConnection and CopyManager. E.g. it is not possible if you use c3p0 connection pool.
Also it is important that according Chapter 4. Batch Processing:
Hibernate disables insert batching at the JDBC level transparently if you use an identity identifier generator.
Upvotes: 1
Reputation: 23562
Make sure you are actually using batching. Configure the batch-related properties properly:
<property name="hibernate.jdbc.batch_size">100</property>
<property name="hibernate.order_inserts">true</property>
<property name="hibernate.order_updates">true</property>
Here I've set the jdbc.batch_size
to 100; this value should match the count of the processed items before you flush the session (25 in your case; maybe you should give it a try with more, like 100).
Also, you are evicting items explicitly. If Item
is associated with other entities and neither cascade type ALL
nor DETACH
are specified on the association, then eviction will not be cascaded to the related entities and they will consume lots of heap, because Hibernate will keep them in the persistence context (session), thus degrading the application performance significantly if there are hundreds of thousands of them.
Make sure to detach (evict) the entire object graph, or clear the entire session and read the next chunk of items to be processed.
Upvotes: 1
Reputation: 4584
You can try to write your own stored procedure. Hibernate not the best for your needs..
Upvotes: 1