Reputation: 434
I have a query which aims to retrieve a random row from a result set. I do not want to use ORDER BY Rand()
as it seems to be rather inefficient.
My method is as follows:
example query:
SELECT * FROM(
(SELECT @rand := RAND(), @rank := 0) r1
CROSS JOIN
(SELECT (@rank:=@rank+1) as num, A.id FROM
A JOIN B
ON A.id = B.id
WHERE B.number = 42
)
WHERE num = FLOOR(1 + @rand * @rank) LIMIT 1
This works for retrieving one row, but I instead want 10 random rows. Changing LIMIT 1
to LIMIT 10
doesn't work, because if num + 10 > number of rows
the query doesn't return 10 rows.
The only solution I can think of it to either generate 10 random numbers in the sql query, check they are all different from each other and have several WHERE num = random_number_1
lines. Alternatively, I could call the query 10 times, checking that the rows selected are unique. I wouldn't know how to do the former, and the latter seems like it is rather inefficient. Unless there is likely to be some wonderful cache that would make running the same query extremely fast?
Does anyone have any ideas? thank you
Upvotes: 2
Views: 933
Reputation: 1856
You could try the following:
select sq2.c1
from ( select *
from (select @count := 0) sq0
cross join
(select t1.c1, @count := @count+1
from t t1
join t t2
using(c1)
where t2.c2 = 42
) sq1
) sq2
--use a probability to pick random rows
where if(@count <= 5, 1, floor(1 + rand() * (@count-1))) <= ceiling(log(pow(@count,2)))+1
limit 5;
The results will be random unless the result set is smaller (or the same size as) the limit. If this is a problem, you can wrap the whole thing:
select sq3.* from ( select ... limit 5 ) sq3
order by rand().
This will only randomize the small number of output rows (at most 5) which is efficient.
Of course, you can always use a temporary table:
create temporary table rset (row_key int auto_increment, key(row_key))
as ( select .... where c2 = 42 ) engine=myisam;
set @count := select count(*) from rset;
select rset.c1
from rset
where row_key in ( (floor(1 + rand() * (@count-1))),
(floor(1 + rand() * (@count-1))),
(floor(1 + rand() * (@count-1))),
(floor(1 + rand() * (@count-1))),
(floor(1 + rand() * (@count-1))) );
drop table rset;
If you want to guarantee that you get five unique rows, then you can use a second temporary table:
create temporary table row_keys ( row_key int not null primary key );
-- do this successful five times. if you get a unique key error try again
insert into row_keys values (floor(1 + rand() * (@count-1));
select rset.c1
from rset
join row_keys
using(row_key);
Upvotes: 1