Reputation: 93943
I'm using Python to write to a postgres database:
sql_string = "INSERT INTO hundred (name,name_slug,status) VALUES ("
sql_string += hundred + ", '" + hundred_slug + "', " + status + ");"
cursor.execute(sql_string)
But because some of my rows are identical, I get the following error:
psycopg2.IntegrityError: duplicate key value
violates unique constraint "hundred_pkey"
How can I write an 'INSERT unless this row already exists' SQL statement?
I've seen complex statements like this recommended:
IF EXISTS (SELECT * FROM invoices WHERE invoiceid = '12345')
UPDATE invoices SET billed = 'TRUE' WHERE invoiceid = '12345'
ELSE
INSERT INTO invoices (invoiceid, billed) VALUES ('12345', 'TRUE')
END IF
But firstly, is this overkill for what I need, and secondly, how can I execute one of those as a simple string?
Upvotes: 701
Views: 710433
Reputation: 1277
You can also use MERGE introduced in Postgres 15
Nevertheless, not as straightforward as INSERT … ON CONFLICT DO NOTHING
Upvotes: 5
Reputation: 42
In Postgres version 9.5 or higher you can use ON CONFLICT
to avoid errors of contraints like @Arie mentioned above. To know more options related to this INSERT
query refer to Postgres Docs.
An alternative solution is by using try/catch to handle runtime errors
Upvotes: 1
Reputation: 30372
One approach would be to create a non-constrained (no unique indexes) table to insert all your data into and do a select distinct from that to do your insert into your hundred table.
So high level would be. I assume all three columns are distinct in my example so for step3 change the NOT EXISTS join to only join on the unique columns in the hundred table.
Create temporary table. See docs here.
CREATE TEMPORARY TABLE temp_data(name, name_slug, status);
INSERT Data into temp table.
INSERT INTO temp_data(name, name_slug, status);
Add any indexes to the temp table.
Do main table insert.
INSERT INTO hundred(name, name_slug, status)
SELECT DISTINCT name, name_slug, status
FROM hundred
WHERE NOT EXISTS (
SELECT 'X'
FROM temp_data
WHERE
temp_data.name = hundred.name
AND temp_data.name_slug = hundred.name_slug
AND temp_data.status = status
);
Upvotes: 65
Reputation: 1
INSERT INTO invoices (invoiceid, billed) (
SELECT '12345','TRUE' WHERE NOT EXISTS (
SELECT 1 FROM invoices WHERE invoiceid='12345' AND billed='TRUE'
)
)
Upvotes: -1
Reputation: 399
we can simplify the query using upsert
insert into invoices (invoiceid, billed)
values ('12345', 'TRUE')
on conflict (invoiceid) do
update set billed=EXCLUDED.billed;
Upvotes: 9
Reputation: 12425
Postgres 9.5 (released since 2016-01-07) offers an "upsert" command, also known as an ON CONFLICT clause to INSERT:
INSERT ... ON CONFLICT DO NOTHING/UPDATE
It solves many of the subtle problems you can run into when using concurrent operation, which some other answers propose.
Upvotes: 1186
Reputation: 1471
This is exactly the problem I face and my version is 9.5
And I solve it with SQL query below.
INSERT INTO example_table (id, name)
SELECT 1 AS id, 'John' AS name FROM example_table
WHERE NOT EXISTS(
SELECT id FROM example_table WHERE id = 1
)
LIMIT 1;
Hope that will help someone who has the same issue with version >= 9.5.
Thanks for reading.
Upvotes: 21
Reputation: 7
If you say that many of your rows are identical you will end checking many times. You can send them and the database will determine if insert it or not with the ON CONFLICT clause as follows
INSERT INTO Hundred (name,name_slug,status) VALUES ("sql_string += hundred
+",'" + hundred_slug + "', " + status + ") ON CONFLICT ON CONSTRAINT
hundred_pkey DO NOTHING;" cursor.execute(sql_string);
Upvotes: 1
Reputation: 35
Your column "hundred" seems to be defined as primary key and therefore must be unique which is not the case. The problem isn't with, it is with your data.
I suggest you insert an id as serial type to handly the primary key
Upvotes: 1
Reputation: 291
There is a nice way of doing conditional INSERT in PostgreSQL using WITH query: Like:
WITH a as(
select
id
from
schema.table_name
where
column_name = your_identical_column_value
)
INSERT into
schema.table_name
(col_name1, col_name2)
SELECT
(col_name1, col_name2)
WHERE NOT EXISTS (
SELECT
id
FROM
a
)
RETURNING id
Upvotes: 8
Reputation: 630
The solution in simple, but not immediatly.
If you want use this instruction, you must make one change to the db:
ALTER USER user SET search_path to 'name_of_schema';
after these changes "INSERT" will work correctly.
Upvotes: -19
Reputation: 1332
Here is a generic python function that given a tablename, columns and values, generates the upsert equivalent for postgresql.
import json
def upsert(table_name, id_column, other_columns, values_hash):
template = """
WITH new_values ($$ALL_COLUMNS$$) as (
values
($$VALUES_LIST$$)
),
upsert as
(
update $$TABLE_NAME$$ m
set
$$SET_MAPPINGS$$
FROM new_values nv
WHERE m.$$ID_COLUMN$$ = nv.$$ID_COLUMN$$
RETURNING m.*
)
INSERT INTO $$TABLE_NAME$$ ($$ALL_COLUMNS$$)
SELECT $$ALL_COLUMNS$$
FROM new_values
WHERE NOT EXISTS (SELECT 1
FROM upsert up
WHERE up.$$ID_COLUMN$$ = new_values.$$ID_COLUMN$$)
"""
all_columns = [id_column] + other_columns
all_columns_csv = ",".join(all_columns)
all_values_csv = ','.join([query_value(values_hash[column_name]) for column_name in all_columns])
set_mappings = ",".join([ c+ " = nv." +c for c in other_columns])
q = template
q = q.replace("$$TABLE_NAME$$", table_name)
q = q.replace("$$ID_COLUMN$$", id_column)
q = q.replace("$$ALL_COLUMNS$$", all_columns_csv)
q = q.replace("$$VALUES_LIST$$", all_values_csv)
q = q.replace("$$SET_MAPPINGS$$", set_mappings)
return q
def query_value(value):
if value is None:
return "NULL"
if type(value) in [str, unicode]:
return "'%s'" % value.replace("'", "''")
if type(value) == dict:
return "'%s'" % json.dumps(value).replace("'", "''")
if type(value) == bool:
return "%s" % value
if type(value) == int:
return "%s" % value
return value
if __name__ == "__main__":
my_table_name = 'mytable'
my_id_column = 'id'
my_other_columns = ['field1', 'field2']
my_values_hash = {
'id': 123,
'field1': "john",
'field2': "doe"
}
print upsert(my_table_name, my_id_column, my_other_columns, my_values_hash)
Upvotes: -2
Reputation:
It's easy with rules:
CREATE RULE file_insert_defer AS ON INSERT TO file
WHERE (EXISTS ( SELECT * FROM file WHERE file.id = new.id)) DO INSTEAD NOTHING
But it fails with concurrent writes ...
Upvotes: 3
Reputation: 188
INSERT .. WHERE NOT EXISTS is good approach. And race conditions can be avoided by transaction "envelope":
BEGIN;
LOCK TABLE hundred IN SHARE ROW EXCLUSIVE MODE;
INSERT ... ;
COMMIT;
Upvotes: 4
Reputation: 664
psycopgs cursor class has the attribute rowcount.
This read-only attribute specifies the number of rows that the last execute*() produced (for DQL statements like SELECT) or affected (for DML statements like UPDATE or INSERT).
So you could try UPDATE first and INSERT only if rowcount is 0.
But depending on activity levels in your database you may hit a race condition between UPDATE and INSERT where another process may create that record in the interim.
Upvotes: 1
Reputation: 9764
How can I write an 'INSERT unless this row already exists' SQL statement?
There is a nice way of doing conditional INSERT in PostgreSQL:
INSERT INTO example_table
(id, name)
SELECT 1, 'John'
WHERE
NOT EXISTS (
SELECT id FROM example_table WHERE id = 1
);
CAVEAT This approach is not 100% reliable for concurrent write operations, though. There is a very tiny race condition between the SELECT
in the NOT EXISTS
anti-semi-join and the INSERT
itself. It can fail under such conditions.
Upvotes: 498
Reputation: 8324
I was looking for a similar solution, trying to find SQL that work work in PostgreSQL as well as HSQLDB. (HSQLDB was what made this difficult.) Using your example as a basis, this is the format that I found elsewhere.
sql = "INSERT INTO hundred (name,name_slug,status)"
sql += " ( SELECT " + hundred + ", '" + hundred_slug + "', " + status
sql += " FROM hundred"
sql += " WHERE name = " + hundred + " AND name_slug = '" + hundred_slug + "' AND status = " + status
sql += " HAVING COUNT(*) = 0 );"
Upvotes: -2
Reputation: 735
The approach with the most upvotes (from John Doe) does somehow work for me but in my case from expected 422 rows i get only 180. I couldn't find anything wrong and there are no errors at all, so i looked for a different simple approach.
Using IF NOT FOUND THEN
after a SELECT
just works perfectly for me.
(described in PostgreSQL Documentation)
Example from documentation:
SELECT * INTO myrec FROM emp WHERE empname = myname;
IF NOT FOUND THEN
RAISE EXCEPTION 'employee % not found', myname;
END IF;
Upvotes: 1
Reputation: 756
I know this question is from a while ago, but thought this might help someone. I think the easiest way to do this is via a trigger. E.g.:
Create Function ignore_dups() Returns Trigger
As $$
Begin
If Exists (
Select
*
From
hundred h
Where
-- Assuming all three fields are primary key
h.name = NEW.name
And h.hundred_slug = NEW.hundred_slug
And h.status = NEW.status
) Then
Return NULL;
End If;
Return NEW;
End;
$$ Language plpgsql;
Create Trigger ignore_dups
Before Insert On hundred
For Each Row
Execute Procedure ignore_dups();
Execute this code from a psql prompt (or however you like to execute queries directly on the database). Then you can insert as normal from Python. E.g.:
sql = "Insert Into hundreds (name, name_slug, status) Values (%s, %s, %s)"
cursor.execute(sql, (hundred, hundred_slug, status))
Note that as @Thomas_Wouters already mentioned, the code above takes advantage of parameters rather than concatenating the string.
Upvotes: 10
Reputation:
You can make use of VALUES - available in Postgres:
INSERT INTO person (name)
SELECT name FROM person
UNION
VALUES ('Bob')
EXCEPT
SELECT name FROM person;
Upvotes: 17
Reputation: 425713
Unfortunately, PostgreSQL
supports neither MERGE
nor ON DUPLICATE KEY UPDATE
, so you'll have to do it in two statements:
UPDATE invoices
SET billed = 'TRUE'
WHERE invoices = '12345'
INSERT
INTO invoices (invoiceid, billed)
SELECT '12345', 'TRUE'
WHERE '12345' NOT IN
(
SELECT invoiceid
FROM invoices
)
You can wrap it into a function:
CREATE OR REPLACE FUNCTION fn_upd_invoices(id VARCHAR(32), billed VARCHAR(32))
RETURNS VOID
AS
$$
UPDATE invoices
SET billed = $2
WHERE invoices = $1;
INSERT
INTO invoices (invoiceid, billed)
SELECT $1, $2
WHERE $1 NOT IN
(
SELECT invoiceid
FROM invoices
);
$$
LANGUAGE 'sql';
and just call it:
SELECT fn_upd_invoices('12345', 'TRUE')
Upvotes: 20