Reputation: 35098
I have a table as follows
id| page | text
------------------------
1 | page1 | Hello World
2 | page1 | Foo Bar
3 | page2 | Baz Baz
3 | page2 | Some Text
4 | page3 | Some Other Text
I want to select 2 random entries - but each page is allowed to appear only once in the result.
I have
SELECT * FROM mydata ORDER BY RANDOM(); LIMIT 2
But can I combine this with DISTINCT
or grouping?
Upvotes: 2
Views: 1417
Reputation: 23552
That may work:
SELECT * FROM
(SELECT * FROM mydata GROUP BY page) t
ORDER BY RANDOM() LIMIT 2
Upvotes: 0
Reputation: 39423
Same as Erwin's answer, just a bit structured: http://www.sqlfiddle.com/#!1/d3e83/6
with first_random as
(
select * from tbl order by random() limit 1
)
, second_random as
(
select *
from tbl
where page <> (select page from first_random)
order by random() limit 1
)
select * from first_random
union
select * from second_random;
Same with a_horse_with_no_name's answer, except this is correct: http://www.sqlfiddle.com/#!1/d3e83/12
select id, page, text, rn
from (
select id, page, text,
row_number() over (partition by page order by random()) as rn
from tbl
) x
where rn = 1
order by random()
limit 2;
Opt for the latter, it has simpler execution plan
Upvotes: 1
Reputation: 657932
If you want:
... a total of two rows from the base table
... and give every page an equal chance to appear in the sample, regardless of how many entries it has in the table:
SELECT *
FROM (
SELECT DISTINCT ON (page) *
FROM mydata
ORDER BY page, random() -- pick one random entry per page
) x
ORDER BY random() -- pick two random pages
LIMIT 2;
Or, with a window function:
WITH x AS (
SELECT *, row_number() OVER (PARTITION BY page ORDER BY random()) AS rn
FROM mydata
)
SELECT id, page, text
FROM x
WHERE rn = 1
ORDER BY random()
LIMIT 2;
You'll have to test which is faster.
If you are dealing with a big table and need fast performance, you can do better. Here is one way how.
If, on the other hand, you want:
... a total of two rows from table mydata
... and give every entry an (almost) equal chance a to appear in the sample, effectively giving a better chance to pages with more entries in the table.
Chances are still not truly equal - your restriction increases the chances for entries of rare pages by definition.
WITH x AS (
SELECT *
FROM mydata
ORDER BY random()
LIMIT 1
)
SELECT * FROM x
UNION ALL
(
SELECT m.*
FROM mydata m
, x
WHERE m.page <> x.page -- assuming page IS NOT NULL
ORDER BY random()
LIMIT 1
);
The parenthesis around the second SELECT
of the UNION
is required to allow for individual ordering.
Tested with PostgreSQL 9.1. Window functions require version 8.4 or later.
Upvotes: 1
Reputation:
Something like:
select id, page, text
from (
select id, page, text,
row_number() over (partition by page order by random()) as rn
from mydata
)
where rn <= 2
Upvotes: 2