Reputation: 7429
I'm using Postgres 9.5 and seeing some wired things here.
I've a cron job running ever 5 mins firing a sql statement that is adding a list of records if not existing.
INSERT INTO
sometable (customer, balance)
VALUES
(:customer, :balance)
ON CONFLICT (customer) DO NOTHING
sometable.customer is a primary key (text)
sometable structure is:
id: serial
customer: text
balance: bigint
Now it seems like everytime this job runs, the id field is silently incremented +1. So next time, I really add a field, it is thousands of numbers above my last value. I thought this query checks for conflicts and if so, do nothing but currently it seems like it tries to insert the record, increased the id and then stops.
Any suggestions?
Upvotes: 64
Views: 30044
Reputation: 652
From version 15, PostgreSQL support the MERGE clause, which can avoid the sequence pre-fetch problem. Here is the test table structure:
create table sometable (
id serial not null primary key,
customer varchar(20),
balance int);
create unique index uk_sometable on sometable (customer);
The following merge SQL can be executed several times to verify how many sequences will be consumed:
with record as (select 'cust-001' as customer, 1000 as balance)
merge into sometable s using record r on s.customer = r.customer
when matched then do nothing
when not matched then insert (customer, balance)
values (r.customer, r.balance);
And now a normal insert SQL:
insert into sometable (customer, balance) values ('cust-002', 2000);
At this time select * from sometable
shows that the id of 'cust-002' is 2, no gap sequence.
Upvotes: 1
Reputation: 1734
I have run into this problem as well, where I was trying to use a UNIQUE
column alongside an autoincrementing INTEGER
or BIGINT
to compress/deduplicate long string values that have low cardinality (many repeat values). Simply executing INSERT ... ON CONFLICT DO NOTHING
every time was leading to the sequence having far more gaps than useful values, to the point of risking exhaustion/wraparound of sequence numbers.
I could think of two approaches, each with their own tradeoffs:
LOCK
the table in EXCLUSIVE
(*) mode, then do a SELECT
to check for an existing row, followed by a simple INSERT
only if necessary. This will pretty much guarantee no gaps in the sequence, but will bottleneck throughput on that table. You might be able to combine this with a local in-memory cache of existing value->ID mappings to avoid the lock where possible, assuming you never modify the rows once inserted. You could also do a SELECT
before locking to see if the value already exists and avoid the lock when rows are more likely than not to already exist (this is also known as "double-checked locking").
Don't lock the table; do a SELECT
to check for an existing row, then an INSERT .. ON CONFLICT DO NOTHING RETURNING id
if necessary, then a final SELECT
only if nothing was returned by the INSERT
. This can greatly reduce the number of sequence increments, though lots of concurrently running attempts to insert the same value can still create gaps. This is best done in autocommit mode (not inside a transaction) since transactions (which can't go below READ COMMITTED
isolation level in Postgres) can delay the visibility of the inserted row, causing more unnecessary sequence increments.
(*) = EXCLUSIVE
is the strongest table lock mode that still allows concurrent SELECT
statements, as long as they don't lock rows (i.e., without FOR SHARE
, FOR UDPATE
, etc.). This be weakened if needed as long as the lock is still self-exclusive; the weakest choice is thus SHARE UPDATE EXCLUSIVE
, which would allow concurrent INSERT
, UPDATE
, and DELETE
. In general, for this pattern to guarantee gapless sequence numbers, only concurrent UPDATE
s to non-key columns of existing rows are safe. See Explicit Locking in the PostgreSQL documentation for more details about the lock levels.
Upvotes: 0
Reputation: 1446
Well there is technique that allows you to do stuff like that. They call insert mutex. It is old old old, but it works.
https://www.percona.com/blog/2011/11/29/avoiding-auto-increment-holes-on-innodb-with-insert-ignore/
Generally idea is that you do INSERT SELECT
and if your values are duplicating the SELECT
does not return any results that of course prevents INSERT
and the index is not incremented. Bit of mind boggling, but perfectly valid and performant.
This of course completely ignores ON DUPLICATE
but one gets back control over the index.
Upvotes: -4
Reputation: 97718
The reason this feels weird to you is that you are thinking of the increment on the counter as part of the insert operation, and therefore the "DO NOTHING" ought to mean "don't increment anything". You're picturing this:
But in fact, the increment has to happen before the insert is attempted. A SERIAL
column in Postgres is implemented as a DEFAULT
which executes the nextval()
function on a bound SEQUENCE
. Before the DBMS can do anything with the data, it's got to have a complete set of columns, so the order of operations is like this:
This can be seen intuitively if the duplicate key is in the autoincrement field itself:
CREATE TABLE foo ( id SERIAL NOT NULL PRIMARY KEY, bar text );
-- Insert row 1
INSERT INTO foo ( bar ) VALUES ( 'test' );
-- Reset the sequence
SELECT setval(pg_get_serial_sequence('foo', 'id'), 0, true);
-- Attempt to insert row 1 again
INSERT INTO foo ( bar ) VALUES ( 'test 2' )
ON CONFLICT (id) DO NOTHING;
Clearly, this can't know if there's a conflict without incrementing the sequence, so the "do nothing" has to come after that increment.
Upvotes: 60
Reputation: 5599
As already said by @a_horse_with_no_name and @Serge Ballesta serials are always incremented even if INSERT
fails.
You can try to "rollback" serial value to maximum id
used by changing the corresponding sequence:
SELECT setval('sometable_id_seq', MAX(id), true) FROM sometable;
Upvotes: 8
Reputation: 148965
As said by @a_horse_with_no_name, that is by design. Serial type fields are implemented under the hood through sequences, and for evident reasons, once you have gotten a new value from a sequence, you cannot rollback the last value. Imagine the following scenario:
That is the reason why sequences (and serial field) just document that in case of rollbacked transactions holes can occur in the returned values. Only unicity is guaranteed.
Upvotes: 4