Reputation: 577
I'm working with PostgreSQL 9.1. Let's say I have a table where some columns have UNIQUE
constraint. The easiest example:
CREATE TABLE test (
value INTEGER NOT NULL UNIQUE
);
Now, when inserting some values, I have to separately handle the case, where the values to be inserted are already in the table. I have two options:
SELECT
beforehand to ensure the values are not in the table, or:INSERT
and watch for any errors the server might return.The application utilizing the PostgreSQL database is written in Ruby. Here's how I would code the second option:
require 'pg'
db = PG.connect(...)
begin
db.exec('INSERT INTO test VALUES (66)')
rescue PG::UniqueViolation
# ... the values are already in the table
else
# ... the values were brand new
end
db.close
Here's my thinking: let's suppose we make a SELECT
first, before inserting. The SQL engine would have to scan the rows and return any matching tuples. If there are none, we make an INSERT
, which presumably makes yet another scan, to see if the UNIQUE
constraint is not about to be violated by any chance. So, in theory, second option would speed the execution up by 50%. Is this how PostgreSQL would actually behave?
We're assuming there's no ambiguity when it comes to the exception itself (e.g. we only have one UNIQUE
constraint).
Is it a common practice? Or are there any caveats to it? Are there any more alternatives?
Upvotes: 1
Views: 119
Reputation: 44250
You don't (and shouldn't) have to test before; you can test while inserting. Just add the test as a where clause. The following insert inserts either zero or one tuple, dependiing on the existence of a row with the same value. (and it certainly is not slower) :
INSERT INTO test (value)
SELECT 55
WHERE NOT EXISTS (
SELECT * FROM test
WHERE value = 55
);
Though your error-driven approach may look elegant from the client side, from the database side it is a near-disaster: the current transaction is rolled back implicitely + all cursors (including prepared statements) are closed. (thus: your application will have to rebuild the complete transaction but without the error and start again.)
Addition: when adding more than one row you can put the VALUES()
into a CTE and refer to the CTE in the insert query:
WITH vvv(val) AS (
VALUES (11),(22),(33),(44),(55),(66)
)
INSERT INTO test(value)
SELECT val FROM vvv
WHERE NOT EXISTS (
SELECT *
FROM test nx
WHERE nx.value = vvv.val
);
-- SELECT * FROM test;
Upvotes: 1
Reputation: 23910
It depends - if your application UI normally allows entering duplicate values, then it's strongly encouraged to check before inserting. Because any error would invalidate current transaction, consume sequence/serial values, fill logs with error messages etc.
But if your UI is not allowing duplicates, and inserting duplicate is only possible when somebody is using tricks (for example during vulnerability research) or highly improbable then I'd allow inserting without checking first.
As unique constraint forces creation of an index, this check is not slow. But definitely slightly slower than inserting and checking for rare errors. Postgres 9.5 would have on conflict do nothing
support, which would be both fast and safe. You'd check number of rows inserted to detect duplicates.
Upvotes: 2