Reputation: 7429

serial in postgres is being increased even though I added on conflict do nothing

I'm using Postgres 9.5 and seeing some wired things here.

I've a cron job running ever 5 mins firing a sql statement that is adding a list of records if not existing.

INSERT INTO 
   sometable (customer, balance)
VALUES
    (:customer, :balance)
ON CONFLICT (customer) DO NOTHING

sometable.customer is a primary key (text)

sometable structure is:
id: serial
customer: text
balance: bigint

Now it seems like everytime this job runs, the id field is silently incremented +1. So next time, I really add a field, it is thousands of numbers above my last value. I thought this query checks for conflicts and if so, do nothing but currently it seems like it tries to insert the record, increased the id and then stops.

Any suggestions?

Upvotes: 64

Answers (6)

peihan

Reputation: 652

From version 15, PostgreSQL support the MERGE clause, which can avoid the sequence pre-fetch problem. Here is the test table structure:

create table sometable (
    id serial not null primary key,
    customer varchar(20),
    balance int);
create unique index uk_sometable on sometable (customer);

The following merge SQL can be executed several times to verify how many sequences will be consumed:

with record as (select 'cust-001' as customer, 1000 as balance)
merge into sometable s using record r on s.customer = r.customer
when matched then do nothing
when not matched then insert (customer, balance)
    values (r.customer, r.balance);

And now a normal insert SQL:

insert into sometable (customer, balance) values ('cust-002', 2000);

At this time select * from sometable shows that the id of 'cust-002' is 2, no gap sequence.

Upvotes: 1

kbolino

Reputation: 1734

I have run into this problem as well, where I was trying to use a UNIQUE column alongside an autoincrementing INTEGER or BIGINT to compress/deduplicate long string values that have low cardinality (many repeat values). Simply executing INSERT ... ON CONFLICT DO NOTHING every time was leading to the sequence having far more gaps than useful values, to the point of risking exhaustion/wraparound of sequence numbers.

I could think of two approaches, each with their own tradeoffs:

LOCK the table in EXCLUSIVE (*) mode, then do a SELECT to check for an existing row, followed by a simple INSERT only if necessary. This will pretty much guarantee no gaps in the sequence, but will bottleneck throughput on that table. You might be able to combine this with a local in-memory cache of existing value->ID mappings to avoid the lock where possible, assuming you never modify the rows once inserted. You could also do a SELECT before locking to see if the value already exists and avoid the lock when rows are more likely than not to already exist (this is also known as "double-checked locking").
Don't lock the table; do a SELECT to check for an existing row, then an INSERT .. ON CONFLICT DO NOTHING RETURNING id if necessary, then a final SELECT only if nothing was returned by the INSERT. This can greatly reduce the number of sequence increments, though lots of concurrently running attempts to insert the same value can still create gaps. This is best done in autocommit mode (not inside a transaction) since transactions (which can't go below READ COMMITTED isolation level in Postgres) can delay the visibility of the inserted row, causing more unnecessary sequence increments.

(*) = EXCLUSIVE is the strongest table lock mode that still allows concurrent SELECT statements, as long as they don't lock rows (i.e., without FOR SHARE, FOR UDPATE, etc.). This be weakened if needed as long as the lock is still self-exclusive; the weakest choice is thus SHARE UPDATE EXCLUSIVE, which would allow concurrent INSERT, UPDATE, and DELETE. In general, for this pattern to guarantee gapless sequence numbers, only concurrent UPDATEs to non-key columns of existing rows are safe. See Explicit Locking in the PostgreSQL documentation for more details about the lock levels.

Upvotes: 0

sp3c1

Reputation: 1446

Well there is technique that allows you to do stuff like that. They call insert mutex. It is old old old, but it works.

https://www.percona.com/blog/2011/11/29/avoiding-auto-increment-holes-on-innodb-with-insert-ignore/

Generally idea is that you do INSERT SELECT and if your values are duplicating the SELECT does not return any results that of course prevents INSERT and the index is not incremented. Bit of mind boggling, but perfectly valid and performant.

This of course completely ignores ON DUPLICATE but one gets back control over the index.

Upvotes: -4

IMSoP

Reputation: 97718

The reason this feels weird to you is that you are thinking of the increment on the counter as part of the insert operation, and therefore the "DO NOTHING" ought to mean "don't increment anything". You're picturing this:

Check values to insert against constraint
If duplicate detected, abort
Increment sequence
Insert data

But in fact, the increment has to happen before the insert is attempted. A SERIAL column in Postgres is implemented as a DEFAULT which executes the nextval() function on a bound SEQUENCE. Before the DBMS can do anything with the data, it's got to have a complete set of columns, so the order of operations is like this:

Resolve default values, including incrementing the sequence
Check values to insert against constraint
If duplicate detected, abort
Insert data

This can be seen intuitively if the duplicate key is in the autoincrement field itself:

CREATE TABLE foo ( id SERIAL NOT NULL PRIMARY KEY, bar text );
-- Insert row 1
INSERT INTO foo ( bar ) VALUES ( 'test' );
-- Reset the sequence
SELECT setval(pg_get_serial_sequence('foo', 'id'), 0, true);
-- Attempt to insert row 1 again
INSERT INTO foo ( bar ) VALUES ( 'test 2' )
     ON CONFLICT (id) DO NOTHING;

Clearly, this can't know if there's a conflict without incrementing the sequence, so the "do nothing" has to come after that increment.

Upvotes: 60

Adam

Reputation: 5599

As already said by @a_horse_with_no_name and @Serge Ballesta serials are always incremented even if INSERT fails.

You can try to "rollback" serial value to maximum id used by changing the corresponding sequence:

SELECT setval('sometable_id_seq', MAX(id), true) FROM sometable;

Upvotes: 8

Serge Ballesta

Reputation: 148965

As said by @a_horse_with_no_name, that is by design. Serial type fields are implemented under the hood through sequences, and for evident reasons, once you have gotten a new value from a sequence, you cannot rollback the last value. Imagine the following scenario:

sequence is at n
A requires a new value : got n+1
in a concurrent transaction B requires a new value: got n+2
for any reason A rollbacks its transaction - would you feel safe to reset sequence?

That is the reason why sequences (and serial field) just document that in case of rollbacked transactions holes can occur in the returned values. Only unicity is guaranteed.

Upvotes: 4

serial in postgres is being increased even though I added on conflict do nothing

Answers (6)

Related Questions