Alex
Alex

Reputation: 35098

Postgresql: Combine distinct/group by with ORDER BY RANDOM()

I have a table as follows

id| page |   text
------------------------
1 | page1 | Hello World
2 | page1 | Foo Bar
3 | page2 | Baz Baz
3 | page2 | Some Text
4 | page3 | Some Other Text

I want to select 2 random entries - but each page is allowed to appear only once in the result.

I have

SELECT * FROM mydata ORDER BY RANDOM(); LIMIT 2

But can I combine this with DISTINCT or grouping?

Upvotes: 2

Views: 1417

Answers (4)

PiTheNumber
PiTheNumber

Reputation: 23552

That may work:

SELECT * FROM
  (SELECT * FROM mydata GROUP BY page) t
ORDER BY RANDOM() LIMIT 2

Upvotes: 0

Michael Buen
Michael Buen

Reputation: 39423

Same as Erwin's answer, just a bit structured: http://www.sqlfiddle.com/#!1/d3e83/6

with first_random as
(
  select * from tbl order by random() limit 1
)
, second_random as
(
  select * 
  from tbl 
  where page <> (select page from first_random)
  order by random() limit 1
)
select * from first_random
union
select * from second_random;

Same with a_horse_with_no_name's answer, except this is correct: http://www.sqlfiddle.com/#!1/d3e83/12

select id, page, text, rn
from (
  select id, page, text,
         row_number() over (partition by page order by random()) as rn
  from tbl
) x
where rn = 1
order by random() 
limit 2;

Opt for the latter, it has simpler execution plan

Upvotes: 1

Erwin Brandstetter
Erwin Brandstetter

Reputation: 657932

If you want:
... a total of two rows from the base table
... and give every page an equal chance to appear in the sample, regardless of how many entries it has in the table:

SELECT *
FROM  (
    SELECT DISTINCT ON (page) *
    FROM   mydata
    ORDER  BY page, random() -- pick one random entry per page
    ) x
ORDER BY random() -- pick two random pages
LIMIT 2;

Or, with a window function:

WITH x AS (
   SELECT *, row_number() OVER (PARTITION BY page ORDER BY random()) AS rn
   FROM   mydata
   )
SELECT id, page, text
FROM   x
WHERE  rn = 1
ORDER  BY random()
LIMIT  2;

You'll have to test which is faster.
If you are dealing with a big table and need fast performance, you can do better. Here is one way how.


If, on the other hand, you want:
... a total of two rows from table mydata
... and give every entry an (almost) equal chance a to appear in the sample, effectively giving a better chance to pages with more entries in the table.
Chances are still not truly equal - your restriction increases the chances for entries of rare pages by definition.

WITH x AS (
   SELECT *
   FROM   mydata
   ORDER  BY random()
   LIMIT 1
   )
SELECT * FROM x
UNION ALL
(
SELECT m.*
FROM mydata m
   , x
WHERE m.page <> x.page -- assuming page IS NOT NULL
ORDER BY random()
LIMIT 1
);

The parenthesis around the second SELECT of the UNION is required to allow for individual ordering.
Tested with PostgreSQL 9.1. Window functions require version 8.4 or later.

Upvotes: 1

user330315
user330315

Reputation:

Something like:

select id, page, text
from (
  select id, page, text,
         row_number() over (partition by page order by random()) as rn
  from mydata
) 
where rn <= 2

Upvotes: 2

Related Questions