Theo
Theo

Reputation: 132922

How do you do an extended insert using JDBC without building strings?

I've got an application that parses log files and inserts a huge amount of data into database. It's written in Java and talks to a MySQL database over JDBC. I've experimented with different ways to insert the data to find the fastest for my particular use case. The one that currently seems to be the best performer is to issue an extended insert (e.g. a single insert with multiple rows), like this:

INSERT INTO the_table (col1, col2, ..., colN) VALUES
(v1, v2, v3, ..., vN),
(v1, v2, v3, ..., vN),
...,
(v1, v2, v3, ..., vN);

The number of rows can be tens of thousands.

I've tried using prepared statements, but it's nowhere near as fast, probably because each insert is still sent to the DB separately and the tables needs to be locked and whatnot. My colleague who worked on the code before me tried using batching, but that didn't perform well enough either.

The problem is that using extended inserts means that as far as I can tell I need to build the SQL string myself (since the number of rows is variable) and that means that I open up all sorts of SQL injection vectors that I'm no where intelligent enough to find myself. There's got to be a better way to do this.

Obviously I escape the strings I insert, but only with something like str.replace("\"", "\\\""); (repeated for ', ? and \), but I'm sure that isn't enough.

Upvotes: 6

Views: 1953

Answers (5)

michael_j_ward
michael_j_ward

Reputation: 4569

The full answer is to use the rewriteBatchedStatements=true configuration option along with dfa's answer of using a batched statement.

The relevant mysql documentation

A worked MySQL example

Upvotes: 0

Daniel
Daniel

Reputation: 131

Regarding the difference between extended inserts and batching single inserts, the reason I decided to use extended inserts is because I noticed that it took my code alot longer time to insert alot of rows than mysql does from the terminal. This was even though I was batching inserts in batches of 5000. The solution in the end was to use extended inserts.

I quickly retested this theory.

I took two dumps of a table with 1.2 million rows. One using the default extended insert statements you get with mysqldump and the other using:

mysqldump --skip-extended-insert

Then I simply imported the files again into new tables and timed it.

The extended insert test finished in 1m35s and the other in 3m49s.

Upvotes: 1

dfa
dfa

Reputation: 116412

prepared statements + batch insert:

PreparedStatement stmt = con.prepareStatement(
"INSERT INTO employees VALUES (?, ?)");

stmt.setInt(1, 101);
stmt.setString(2, "Paolo Rossi");
stmt.addBatch();

stmt.setInt(1, 102);
stmt.setString(2, "Franco Bianchi");
stmt.addBatch();

// as many as you want   
stmt.executeBatch();

Upvotes: 4

pjp
pjp

Reputation: 17639

If you are loading tens of thousands of records then you're probably better off using a bulk loader.

http://dev.mysql.com/doc/refman/5.0/en/load-data.html

Upvotes: 1

Nick Holt
Nick Holt

Reputation: 34321

I would try batching your inserts and see how that performs.

Have a read of this (http://www.onjava.com/pub/a/onjava/excerpt/javaentnut_2/index3.html?page=2) for more information on batching.

Upvotes: 1

Related Questions