PeterPan
PeterPan

Reputation: 81

Hibernate how to handle 100.000+ entity updates efficiently

I have got a couple of customers which wants to update their data using CSV files, the typical customer's file has got 1.000 up to 20.000 rows.

After parsing the rows I merge new new data into the existing entities. All entities which has been modified gets saved using a batch save:

public void batchSaveOrUpdate(List<Item> items) {
    Transaction transaction = session.beginTransaction();
    List<Item> itemsToEvict = new ArrayList<>();

    int count = 0;
    for (Item item : itemss) {
        session.saveOrUpdate(item);
        itemsToEvict.add(item);

        if (count++ % 25 == 0) {
            session.flush();
            itemsToEvict.forEach(session::evict);
            itemsToEvict.clear();
        }
    }

    transaction.commit();
    session.flush();
}

It works well for normal imports up to 25.000 rows, but now I got a customer which CSV file contains up to 600.000 rows. Even after determining the items which has been modified there are 100.000+ entities which has be updated at once. Since all transactions gets reaped by the WildFly after a while, all imports by the large customer fails. I have already increased the transaction timeout to one hour and further increasing is no solution for me.

What can I do to increase the hibernate update prerformance at this point?

Upvotes: 3

Views: 322

Answers (3)

sibnick
sibnick

Reputation: 4305

First of all CopyManager is most efficient way copy CVS to postgres. May be you should create some temp table, then copy CSV and then perform a set of insert / update SQL statements. It may be tricky to get PGConnection and CopyManager. E.g. it is not possible if you use c3p0 connection pool.

Also it is important that according Chapter 4. Batch Processing:

Hibernate disables insert batching at the JDBC level transparently if you use an identity identifier generator.

Upvotes: 1

Dragan Bozanovic
Dragan Bozanovic

Reputation: 23562

Make sure you are actually using batching. Configure the batch-related properties properly:

<property name="hibernate.jdbc.batch_size">100</property>
<property name="hibernate.order_inserts">true</property>
<property name="hibernate.order_updates">true</property>

Here I've set the jdbc.batch_size to 100; this value should match the count of the processed items before you flush the session (25 in your case; maybe you should give it a try with more, like 100).

Also, you are evicting items explicitly. If Item is associated with other entities and neither cascade type ALL nor DETACH are specified on the association, then eviction will not be cascaded to the related entities and they will consume lots of heap, because Hibernate will keep them in the persistence context (session), thus degrading the application performance significantly if there are hundreds of thousands of them.

Make sure to detach (evict) the entire object graph, or clear the entire session and read the next chunk of items to be processed.

Upvotes: 1

Maksym
Maksym

Reputation: 4584

You can try to write your own stored procedure. Hibernate not the best for your needs..

Upvotes: 1

Related Questions