retrieving multiple random rows from MySQL query result set - without using order by rand()

Question

I have a query which aims to retrieve a random row from a result set. I do not want to use ORDER BY Rand() as it seems to be rather inefficient.

My method is as follows:

generate a single random number between [0,1)
give each row of the result query a unique 'rank' number. i.e. give the first row a value 1, second row a value 2, and so forth
use the random number to get a number between 1 and the number of rows in the result
return the row where rank == the number generated from the random number

example query:

SELECT * FROM(

    (SELECT @rand := RAND(), @rank := 0) r1
    CROSS JOIN
    (SELECT (@rank:=@rank+1) as num, A.id FROM
    A JOIN B
    ON A.id = B.id
    WHERE B.number = 42
)
WHERE num = FLOOR(1 + @rand * @rank) LIMIT 1

This works for retrieving one row, but I instead want 10 random rows. Changing LIMIT 1 to LIMIT 10 doesn't work, because if num + 10 > number of rows the query doesn't return 10 rows.

The only solution I can think of it to either generate 10 random numbers in the sql query, check they are all different from each other and have several WHERE num = random_number_1 lines. Alternatively, I could call the query 10 times, checking that the rows selected are unique. I wouldn't know how to do the former, and the latter seems like it is rather inefficient. Unless there is likely to be some wonderful cache that would make running the same query extremely fast?

Does anyone have any ideas? thank you

Justin Swanhart · Accepted Answer

You could try the following:

select sq2.c1 
  from  ( select * 
            from (select @count :=  0) sq0
           cross join  
                 (select t1.c1, @count := @count+1        
                    from t t1       
                    join t t2      
                   using(c1)      
                   where t2.c2 = 42    
                 ) sq1  
         ) sq2   
 --use a probability to pick random rows
 where if(@count <= 5, 1, floor(1 + rand() * (@count-1))) <= ceiling(log(pow(@count,2)))+1
 limit 5;

The results will be random unless the result set is smaller (or the same size as) the limit. If this is a problem, you can wrap the whole thing:

select sq3.* from ( select ... limit 5 ) sq3 
order by rand().

This will only randomize the small number of output rows (at most 5) which is efficient.

Of course, you can always use a temporary table:

create temporary table rset (row_key int auto_increment, key(row_key))
as ( select .... where c2 = 42 ) engine=myisam;

set @count := select count(*) from rset;

select rset.c1 
  from rset 
 where row_key in (    (floor(1 + rand() * (@count-1))),
(floor(1 + rand() * (@count-1))),
(floor(1 + rand() * (@count-1))),
(floor(1 + rand() * (@count-1))),
(floor(1 + rand() * (@count-1))) );

drop table rset;

If you want to guarantee that you get five unique rows, then you can use a second temporary table:

create temporary table row_keys ( row_key int not null primary key );
-- do this successful five times.  if you get a unique key error try again
insert into row_keys values (floor(1 + rand() * (@count-1));

select rset.c1
  from rset
  join row_keys
  using(row_key);

retrieving multiple random rows from MySQL query result set - without using order by rand()

Answers (1)

Related Questions