Reputation: 155
I'm currently using Apache Derby as a temporary local database for my application and I'd like some tips on how I can optimise INSERT performance.
When the application starts, it reads in records from a CSV file - this could be millions, but more likely hundreds of thousands. Once they are processed, a SELECT query at the end summarises the information.
It currently takes around 1 min to insert 6000 records. There are 3 tables, 2 of them store unique data, and one is a lookup table between the two - it looks as follows:
CREATE TABLE table1
(id INT PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
field1 VARCHAR(40))
CREATE TABLE table2
(id INT PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
field1 VARCHAR(100),
field2 INT,
field3 VARCHAR(40))
CREATE TABLE table1table2lookup
(id INT PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
field1 INT NOT NULL,
field2 INT NOT NULL)
I currently use parameterised PreparedStatements and Batching to try to increase the speeds.
I have also switched off auto commit mode so that I only commit on bulk insert.
Upvotes: 1
Views: 366
Reputation: 32980
From the "Tuning Derby" doc:
Avoid inserts in autocommit mode if possible Inserts can be painfully slow in autocommit mode because each commit involves an update of the log on the disk for each INSERT statement. The commit will not return until a physical disk write is executed. To speed things up: Run in autocommit false mode, execute a number of inserts in one transaction, and then explicitly issue a commit.
But for your initial import you should consider to use import procedures to directly load the data from a file:
If your application allows an initial load into the table, you can use the import procedures to insert data into a table. Derby will not log the individual inserts when loading into an empty table using these interfaces. See the Derby Tools and Utilities Guide Guide for more information on the import procedures.
Upvotes: 3