Postgresql returns random rows when using large OFFSET

Question

I'm curious of this behaviour:

test_db=# create table test_table(id bigint);
test_db=# insert into test_table(id) select * from generate_series(1, 1000000);
test_db=# select * from test_table offset 100000 limit 1;

   id
-------
 87169
(1 row)

test_db=# select * from test_table offset 100000 limit 1;

   id
--------
 186785
 (1 row)

test_db=# select * from test_table offset 100000 limit 1;

  id
--------
 284417
(1 row)

Seems that postgres iterates forward with some randomizing rule. Why does large offset "mix" table? After that if we use small offset it returns "stable" value:

test_db=# select * from test_table offset 1 limit 1;
    id
  --------
   282050
  (1 row)

test_db=# select * from test_table offset 1 limit 1;
     id
  --------
   282050
   (1 row)

jjanes · Accepted Answer

PostgreSQL has a feature that tries to get multiple concurrent sequential scans on the same large table to all work on the same part of the table at the same time, so that they can share cache space and don't have to each read the same data off disk individually. A side effect of this is that for partial (like with LIMIT) sequential scans done consecutively, each one starts where the previous one left off.

The synchronization points are always at page boundaries, so with a low OFFSET and a low LIMIT you just keep reading data from the same page (and from that page's beginning) over and over again and getting the same data.

You can turn this off with set synchronize_seqscans TO off; if you need to get more stable results for some internal testing purpose. If you do this you are, as von Neumann might say, living in a state of sin.

Postgresql returns random rows when using large OFFSET

Answers (2)

Related Questions