Reputation: 36726
I have an application that need to insert a lot of rows in database, but the insertion is too slow at the moment, I want to know what are the best techniques to fast up the insertion to the maximum.
The situation is in each insertion, I need to discover the id of the token and for this, I put the SELECT inside the insertion.
Which can be the best solution for speeding up to maximum the insertion in textBlockHasToken
table including SQL and Java?
Edit: It works at the moment, but I have a large data flow. Some numbers, 100.000 different rows in token table and 2 million rows in textBlockHasToken.
I have these three tables and two insert SQLs:
-- Text blocks of the pages
CREATE TABLE textBlock(
id INTEGER PRIMARY KEY,
pageId INTEGER REFERENCES page,
text VARCHAR NOT NULL,
position INTEGER NOT NULL
);
-- Tokens in the text blocks
CREATE TABLE token(
id INTEGER PRIMARY KEY,
text VARCHAR NOT NULL,
charType VARCHAR NOT NULL,
grammar VARCHAR NOT NULL,
category VARCHAR[] NOT NULL,
stopWord BOOLEAN NOT NULL DEFAULT false,
UNIQUE(text)
);
-- N:N relationship between textblock and token
CREATE TABLE textBlockHasToken(
textBlockId INTEGER REFERENCES textBlock NOT NULL,
tokenId INTEGER REFERENCES token NOT NULL,
sentence INTEGER NOT NULL,
position INTEGER NOT NULL
);
-- Insert the token
INSERT INTO token(id,text,charType,grammar,category)
VALUES(nextval('tokenIdSqc'),?,?,?,?);
-- Insert in relationship in N:M
INSERT INTO textBlockHasToken(textblockId,tokenId,sentence,position)
VALUES(?,(SELECT id FROM token WHERE text = ?),?,?);
Upvotes: 1
Views: 1164
Reputation: 44250
-- Text blocks of the pages
DROP TABLE tmp.textblock CASCADE;
CREATE TABLE tmp.textblock
( id INTEGER PRIMARY KEY
, pageid INTEGER -- REFERENCES page
, ztext VARCHAR NOT NULL
, position INTEGER NOT NULL
);
-- Tokens in the text blocks
DROP TABLE tmp.token CASCADE;
DROP SEQUENCE tmp.tokenidsqc;
CREATE SEQUENCE tmp.tokenidsqc;
CREATE TABLE tmp.token
( id INTEGER PRIMARY KEY DEFAULT nextval('tmp.tokenidsqc')
, ztext VARCHAR NOT NULL
, chartype VARCHAR NOT NULL
, grammar VARCHAR NOT NULL
, category VARCHAR NOT NULL
, stopword BOOLEAN NOT NULL DEFAULT false
, UNIQUE(ztext)
);
-- N:N relationship between textblock and token
DROP TABLE tmp.textblockhastoken CASCADE;
CREATE TABLE tmp.textblockhastoken
( textblockid INTEGER NOT NULL REFERENCES tmp.textblock(id)
, tokenid INTEGER NOT NULL REFERENCES tmp.token(id)
, sentence INTEGER NOT NULL
, position INTEGER NOT NULL
);
-- Insert the token
INSERT INTO tmp.token(ztext,chartype,grammar,category)
VALUES('foo' , 'T' , 'a', 'a' ), ('bar' , 'T' , 'a', 'a' );
SELECT * FROM tmp.token;
Upvotes: 3
Reputation: 66263
duffymo has already mentioned batch-updates. Just to be sure I would add:
But without knowing anything about the existing code this is all very much crystal ball guessing.
Upvotes: 1
Reputation: 730
I totally agree with duffymo, but I would also add, Postgres's COPY feature is the tastes way to import data..
Of course it's not always feasible to do so, especially from within code. which is why I also agree with duffymo, if you NEED to do it from code on a separate machine, do what duffymo said.
Upvotes: 1
Reputation: 308958
The best you can do is to make sure that all WHERE variables have indexes associated with them. They're automatically created for primary keys; make sure foreign key columns have indexes as well.
The other consideration is batching requests. Don't do a network roundtrip for each INSERT; batch them together in chunks and commit each chunk. It'll mean fewer network round trips and manageable transaction logs as well.
Upvotes: 5