Asha Kabra
Asha Kabra

Reputation: 41

what more can be done to improve performance on bulk insert on SAP HANA DB using batchExecute() JDBC

I want to insert over a million records and I am using below code snippet to do same.

final int batchSize = 5000;
int count = 0;
int rows = 2000000;

try {
    con = getConnection();
    String psBatchquery = "insert into SYSTEM.Employee (id, name) values (?,?)";
    batchPs = con.prepareStatement(psBatchquery);

    psBatchstart = System.currentTimeMillis();

    for (int i = 0; i < rows; i++) {
        batchPs.setInt(1, i);
        batchPs.setString(2, "Name" + i);

        batchPs.addBatch();

        if (++count % batchSize == 0) {
            //the reason I added below try catch is because I do not want to stop execution of other records if there is any error record.
            try {
                batchPs.executeBatch();
            } catch (BatchUpdateException a) {
                a.printStackTrace();
            }
        }
    }
    batchPs.executeBatch();
} catch (BatchUpdateException a) {
    a.printStackTrace();
} catch (SQLException e) {
    // TODO Auto-generated catch block
    e.printStackTrace();
}

It is taking approx - 212349 milliseconds with above parameters.

1) What more can be done to improve the performance? 2) What should be optimal batchSize? If I reduce batchSize to 100, timetaken is more 3) I am not explicitly committing, as I do not want to rollback non-erroneous record. Will setting autocommit to false and committing manually in end after last batch is executed help reducing execution time? 4) Will parameter splitBatchCommands from https://help.sap.com/viewer/0eec0d68141541d1b07893a39944924e/2.0.02/en-US/109397c2206a4ab2a5386d494f4cf75e.html help in improving performance?

Upvotes: 1

Views: 1679

Answers (1)

Lars Br.
Lars Br.

Reputation: 10386

Based on your numbers the code is inserting ca. 9.4 records per second which is indeed slow.

One thing to note in the code is that it does not specify the autocommit mode, which means it uses the default for HANA: autocommit = ON.

So, after every command, a commit is executed. From your description it seems to me that you don't actually want that, so adding

 con.setAutoCommit(false);

seems like a good first step to me.

If performance is still bad afterwards, I'd recommend looking at where time is spent. Using the JDBC performance trace option from the HANA JDBC driver could be useful here. You may also do a quick check and see how long it takes to insert the 2Mio records on the server, without sending them from the client.

Something like

insert into SYSTEM.Employee (id, name) 
(select TOP 2000000
        1 as id 
      , 'NAME' as name
 from objects a 
      cross join objects b 
      cross join objects c);

can provide you with a sort of baseline for how long this reasonably would take on your server.


Ok, so I had another look into this and tried to reproduce the bad performance on a HANA instance in my local network (connected via WIFI).

The unchanged test case (autocommit on default = true) finished on average in 18 secs (111,111 records/s) which is much closer to what I would expect.

With a change to autocommit = false this changed only a little, indicating that the wait time for the commit was not the critical factor for the total runtime.

Another quick change was to enable compression (this is available with SAP HANA 2 SP03 or SP04 if I'm not mistaken). Enabling compression for the network communication reduced the total runtime for this test to 14 secs (22%).

Without spending too much time trying to reproduce this scenario, I'd say that it is definitely worthwhile to see if the network performance isn't causing the issues in your case.

Upvotes: 1

Related Questions