Reputation: 4643
We have a very simple table, DDL is as follows:
CREATE TABLE public.next_id (
id varchar(255) NOT NULL,
next_value int8 NOT NULL,
CONSTRAINT next_id_pk PRIMARY KEY (id)
);
The table only has about 120 rows, and does not have any indexes apart from the primary key.
When I do the following UPDATE query in DBeaver to either of two Linux servers running either 10.5 or 11.2 of PostgreSQL, it takes approx 50ms:
update NEXT_ID set next_value=next_value+1 where id='Session';
However, if I point DBeaver at a server running 9.5.3 of PostgreSQL, it takes approx 3ms on average.
Now if I create a FOR loop as follows:
do $$
begin
for i in 1..100000 loop
update NEXT_ID set next_value=next_value+1 where id='Session';
end loop;
end;
$$;
It takes approx the same time (~1.5s) on all of the machines. In other words, the margin of error is probably equivalent to the extra delay experienced with the one record update.
It feels like there is some sort of overhead involved with the transaction around the statements.
How can I get more information about where PostgreSQL is spending the time?
I have tried doing an EXPLAIN ANALYSE
on the above single record UPDATE on the 'slower' servers, and I get the following:
Update on next_id (cost=0.00..2.58 rows=1 width=36) (actual time=0.057..0.057 rows=0 loops=1)
-> Seq Scan on next_id (cost=0.00..2.58 rows=1 width=36) (actual time=0.043..0.044 rows=1 loops=1)
Filter: ((id)::text = 'Session'::text)
Rows Removed by Filter: 125
Planning Time: 0.066 ms
Execution Time: 0.080 ms
Which seems to indicate that the query is actually only taking a couple of ms to plan and execute. So where is the rest of the time being taken?
All the servers involved are using the same database that has been replicated across all of them.
By the way, I'm not interested in anyone telling me that using a VARCHAR(255) for a primary key is a bad idea, as this is the same across all the servers, and it's not the point of this question.
UPDATE: We've noticed that a major difference between a Linux machine that is slow (mine) and another that is fast is the filesystems. My machine is using BTRFS for the filesystem where Postgres is, and the faster machine is using XFS.
A quick google of Postgres on various filesystems reveals a few people saying that using Postgres on BTRFS is (don't cross the streams!) bad.
We are going to attempt to reformat my machine to use XFS to see if that makes a difference.
In the meantime, I'm still interested in giving the bounty to anyone who can tell me how to log where the extra time is being spent.
UPDATE2: Following the suggestion by Nick Barnes in the comments, I ran a sequence of BEGIN; UPDATE ...; COMMIT;
statements explicitly, and the log gave me the following output:
LOG: duration: 0.025 ms parse <unnamed>: begin
LOG: duration: 0.014 ms bind <unnamed>: begin
LOG: duration: 0.003 ms execute <unnamed>: begin
LOG: duration: 0.045 ms parse <unnamed>: update NEXT_ID set next_value=next_value+1 where id='Session'
LOG: duration: 0.055 ms bind <unnamed>: update NEXT_ID set next_value=next_value+1 where id='Session'
LOG: duration: 0.059 ms execute <unnamed>: update NEXT_ID set next_value=next_value+1 where id='Session'
LOG: duration: 0.004 ms parse <unnamed>: commit
LOG: duration: 0.003 ms bind <unnamed>: commit
LOG: duration: 50.237 ms execute <unnamed>: commit
So yes, Nick, the overhead is definitely in the COMMIT. But what is it doing? Is there any way to get more detailed information in the log about what it is doing during that 50ms?
Upvotes: 7
Views: 580
Reputation: 21356
The UPDATE
itself is fairly cheap; before the transaction is committed, there is no need for your new data to survive a crash, so the changes are only made to in-memory buffers (which the server flushes to disk at its leisure).
It's not until you commit the transaction that the server needs to give you a guarantee of durability. Postgres handles crash safety using a write-ahead log (WAL), and when you COMMIT
, you're waiting for the WAL to be synchronously written to disk.
This makes the commit latency highly dependent on the file system and the underlying hardware, and if your PG10 instance is waiting around for BTRFS to finish a copy-on-write or something, that could certainly account for the difference you're seeing.
To confirm that this is the cause, you can skip the synchronous disk write by disabling fsync
(though this puts you at risk of data corruption, so be sure to test it on a throwaway instance). A safer and less intrusive option is to SET LOCAL synchronous_commit = off
at the start of your transaction, which should have the same effect, provided you don't have synchronous replication running.
Upvotes: 3