Reputation: 15454

Repeatable pagination with randomly ordered rows

I have API that returns dome paginated rows from DB. It works, however when I order rows by RANDOM() I get duplicates on consecutive pages. Is there any option to set random seed per query?

If not is it possible to set random SEED globally to force RANDOM() to generate same values per query? Then I could just change global random every 3 minutes or something like that...

U use this code:

SELECT * FROM "table" ORDER BY RANDOM() OFFSET 5 LIMIT 5

Now I want pass seed to this query so I can paginate random results. I should do this like this?:

SELECT "table".*, SETSEED(0.1) FROM "table" ORDER BY RANDOM() OFFSET 5 LIMIT 5
SELECT "table".*, SETSEED(0.1) FROM "table" ORDER BY RANDOM() OFFSET 10 LIMIT 5

And results will be correctly paginated?

Upvotes: 18

Answers (5)

Devin

Reputation: 911

A very quick and dirty option for a table with primary keys that are UUIDs (or some other kind of hash) is to order the results by substrings of the UUIDs. In your app, randomly generate a few positive integers between 1 and 36 and use these positions in your query.

For instance, if I want five numbers and generate {23, 12, 35, 16, 3}, I'd use:

select * from (
  select
    organization_id,
    substring(cast(organization_id as text) from 23 for 1) as o1,
    substring(cast(organization_id as text) from 12 for 1) as o2,
    substring(cast(organization_id as text) from 35 for 1) as o3,
    substring(cast(organization_id as text) from 16 for 1) as o4,
    substring(cast(organization_id as text) from 3 for 1) as o5
  from channels_organizations
) t order by o1, o2, o3, o4, o5;

This would generate results like the following, which you can paginate easily:

organization_id                       o1  o2  o3  o4  o5
a059cd76-9d91-48db-8982-986fcd217b2a   2   9   2   8   5
3ce14f26-3e56-46eb-9a74-22862cc3ed4e   4   5   4   6   e
8e115b7e-2e7e-480e-9bc6-296deff3ed87   6   7   8   8   1
e1969c52-5028-47da-92ea-9f2918dcbf4d   a   2   4   7   9
42eb7292-e881-4a04-b83a-3bf78548dab4   a   8   b   a   e
e8a33112-532f-4fec-b25b-416c5409ac7e   b   2   7   f   a
a763efaa-79a4-4cfa-92bc-803ebc5ff221   c   a   2   c   6
581cae5b-5000-4aa6-837d-002ccf806e28   d   0   2   a   1
6b0ed7b4-b44d-4a51-910f-f3f2f0354d55   f   4   5   a   0
9369a547-f7e0-43e7-96ef-62bf631a0f0b   f   e   0   3   6

In this example, I pick an index within each group of the UUID format and shuffle them before sending to the DB.

You could simplify this if you're less concerned with pseudorandomness by using just a single substring index, although you'd want to increase the substring length (the n in from x for n) to at least 3 since there's a chance you'd end up with a dash in your substring.

Upvotes: 0

Nathan Long

Reputation: 126072

If the order needs to be "shuffled" but not truly random...

(Update: see my other answer for a more flexible and randomizable solution.)

You say "random" order, which is what you get when calling ORDER BY random() - for each row, PostgreSQL calls random(), gets a value, and uses that to decide how to sort that row within the set of results.

To make this repeatable, you have to mess with seeds. This feels icky. According to the docs:

the effects will persist until the end of the session, unless overridden by another SET

I think this means that when using a connection pool, setseed mutates the connection for the next process that uses that connection.

What about modulo?

I have a case where I don't need true randomness. My criteria are:

not the same order every time
predictable order within pages of the same result set, so that we don't get duplicates on subsequent pages

Eg, this would be fine:

Listing 1
- page 1: items 1, 4
- page 2: items 3, 2
Listing 2 (different user, or same user coming back later)
- page 1: items 3, 1
- page 2: items 2, 4

To get something like this, modulo seems to work well. Eg, ORDER BY id % 7, id for all pages of request 1, and ORDER BY id % 11, id for all pages of request 2. That is, for each row, divide its id by the modulus and sort by the remainder. Within rows with the same remainder, sort by id (to ensure the sort is stable).

The modulus could be picked randomly for the first page, then reused as a parameter for each subsequent page request.

You can see how this might work for your database like this:

echo "select id, id % 7 FROM my_table ORDER BY id % 77, id" | psql my_db > sort.txt

A prime modulus will probably give you the most variation. And if your ids start at 1 (such that % 77 would make the first 77 rows return in the normal order), you could try doing a modulus on a timestamp field instead. Eg:

ORDER BY (extract(epoch from inserted_at)* 100000)::bigint % 77

But you'd need a function index to make that performant.

Upvotes: 13

Nathan Long

Reputation: 126072

Specify the exact row IDs (randomized ahead of time)

This query will give you rows with id 4, 2, 1, and 4 again, in that exact order.

SELECT items.id, items.name
FROM items
-- unnest expands array values into rows
INNER JOIN unnest(ARRAY[4,2,1,4]) AS item_id
ON items.id = item_id

yields

 id |     name
----+---------------
  4 | Toast Mitten
  2 | Pickle Juicer
  1 | Horse Paint
  4 | Toast Mitten

Knowing that, you can supply the ids that should be included on each page however you want.

For example, you could SELECT id FROM items ORDER BY random(), break the list into "pages" of (say) 5 ids each, and hold it in application memory, or in Redis, or wherever. For each page requested, you'd run the query above with the correct page of ids.

Variations:

For true randomness, you could enable pgcrypto and ORDER BY gen_random_uuid().
You could omit the ORDER BY and shuffle the ids in-memory in your programming language.
You could create a different shuffling per user or per day

Upvotes: 0

Clodoaldo Neto

Reputation: 125444

With this union all technique the random order is repeatable

select a, b
from (
    select setseed(0.1), null as a, null as b

    union all

    select null, a, b
    from t

    offset 1
) s
order by random()
offset 0
limit 5
;

Upvotes: 5

Mureinik

Reputation: 311948

You can use setseed(dp) to seed random() with a seed in [-1.0, 1.0]. E.g.:

engine=> SELECT SETSEED(0.16111981);
 setseed 
---------

(1 row)

engine=> SELECT RANDOM();
      random       
-------------------
 0.205839179921895
(1 row)

engine=> SELECT RANDOM();
      random       
-------------------
 0.379503262229264
(1 row)

engine=> SELECT RANDOM();
      random       
-------------------
 0.268553872592747
(1 row)

engine=> SELECT RANDOM();
      random       
-------------------
 0.788029655814171
(1 row)

And of course, each time you reseed, you'll get the exact same result:

engine=> SELECT SETSEED(0.16111981), RANDOM();
 setseed |      random       
---------+-------------------
         | 0.205839179921895
(1 row)

engine=> SELECT SETSEED(0.16111981), RANDOM();
 setseed |      random       
---------+-------------------
         | 0.205839179921895
(1 row)

engine=> SELECT SETSEED(0.16111981), RANDOM();
 setseed |      random       
---------+-------------------
         | 0.205839179921895
(1 row)

engine=> SELECT SETSEED(0.16111981), RANDOM();
 setseed |      random       
---------+-------------------
         | 0.205839179921895

(clarification: output was copied from psql, engine is the name of my database)

Upvotes: 3

Repeatable pagination with randomly ordered rows

Answers (5)

If the order needs to be "shuffled" but not truly random...

What about modulo?

Specify the exact row IDs (randomized ahead of time)

Related Questions