adhg
adhg

Reputation: 10883

hibernate performance issue, persist one by one or mass?

I have a text file ~6GB which I need to parse and later persist. By 'parsing' I'm reading a line from the file (usually 2000 chars), create a Car-object from the line and later I persist it.

I'm using a producer consumer pattern to parse and persist and wonder if it makes any difference (for performance reasons) to persist one object at a time or 1000 (or any other amount) in one commit?

At the moment, it takes me >2hr to persist everything (3 million lines) and it looks too much time for me (or I may be wrong).

Currently I'm doing this:

public void persistCar(Car car) throws Exception
{
    try
    {
        carDAO.beginTransaction();  //get hibernate session...

        //do all save here.

        carDAO.commitTransaction(); // commit the session

    }catch(Exception e)
    {
        carDAO.rollback();
        e.printStackTrace(); 
    }
    finally
    {
        carDAO.close();
    }
}

Before I make any design changes I was wondering if there's a reason why this design is better (or not) and if so, what should be the cars.size()? Also, is open/close of session considered expensive?

public void persistCars(List<Car> cars) throws Exception
{
    try
    {
        carDAO.beginTransaction();  //get hibernate session...
        for (Car car : cars)    
        //do all save here.

        carDAO.commitTransaction(); // commit the session

    }catch(Exception e)
    {
        carDAO.rollback();
        e.printStackTrace(); 
    }
    finally
    {
        carDAO.close();
    }
}

Upvotes: 6

Views: 2468

Answers (1)

ManuPK
ManuPK

Reputation: 11839

Traditionally hibernate does not go that well with bulk inserts. There are some ways to optimize it to some level.

Take this example from the API Docs,

Session session = sessionFactory.openSession();
Transaction tx = session.beginTransaction();

for ( int i=0; i<100000; i++ ) {
    Customer customer = new Customer(.....);
    session.save(customer);
    if ( i % 20 == 0 ) { //20, same as the JDBC batch size
        //flush a batch of inserts and release memory:
        session.flush();
        session.clear();
    }
}

tx.commit();
session.close();

In the above example the session if flushed after inserting 20 entries which will make the operation little faster.

Here an interesting article discussing the same stuff.

We have successfully implemented an alternative way of bulk inserts using stored procedures. In this case you will pass the parameters to the SP as "|" separated list, and will write the insert scrips inside the SP. Here the code might look a bit complex but is very effective.

Upvotes: 5

Related Questions