abl
abl

Reputation: 5958

Performance of batched statements using INSERT ... SET vs INSERT ... VALUES

I recently wrote a simple Java program that processed some data and inserted it in a MyISAM table. About 35000 rows had to be inserted. I wrote the INSERT statement using INSERT ... SET syntax and executed it for all rows with PreparedStatement.executeBatch(). So:

String sql = "INSERT INTO my_table"
            + " SET "
                + " my_column_1 = ? "
                + " my_column_2 = ? "
                ... 
                + " my_column_n = ? ";
try(PreparedStatement pst = con.prepareStatement(sql)){
    for(Object o : someCollection){
        pst.setInt(1, ...);
        pst.setInt(2, ...);
        ...
        pst.setInt(n, ...);

        pst.addBatch();
    }
    pst.executeBatch();
}

I tried inserting all rows in a single batch and in bacthes of 1000, but in all cases the execution was VERY slow (about 1 minute per 1000 rows). After some tinkering I found that changing the syntax to INSERT ... VALUES improved the speed dramatically, 100x at the very least (I didn't measure it accurately).

String sql = "INSERT INTO my_table (my_column_1, my_column_2, ... , my_column_n)"
            + " VALUES (?, ?, ... , ?)";

What's going on here? Can it be that the JDBC driver cannot rewrite the batches when using INSERT ... SET? I didn't find any documentation about this. I am creating my connections with options rewriteBatchedStatements=true&useServerPrepStmts=false.

I first noticed this problem when accessing a database in another host. That is, I have used the INSERT ... SET approach before without any noticeable performance issue in applications that were executing in the same host as the database. So I guess the problem may be that many more statements are sent over the network with INSERT ... SET than with INSERT ... VALUES.

Upvotes: 1

Views: 704

Answers (1)

user2910265
user2910265

Reputation: 894

If you examine the INSERT ... SET syntax, you'll see it's only meant for inserting a single row. INSERT ... VALUES is meant for inserting multiple rows at one time.

In other words - even though you set rewriteBatchedStatements=true, the JDBC driver can't optimize the SET variation like it can with the VALUES variation because SET is not built for the batch case you have. Use VALUES to compress N inserts into one.

Bonus tip - If you use ON DUPLICATE KEY UPDATE, the JDBC currently can't rewrite those statements either. (edit: This statement is false - my mistake.)

There's an option you can set to verify all of this for yourself (I think it's 'profileSQL').

Upvotes: 1

Related Questions