Peter Nixey
Peter Nixey

Reputation: 16565

How do you efficiently (in a DB independent manner) select random records from a table?

This seems like an incredibly simple problem however it isn't working out as trivially as I'd expected.

I have a club which has club members and I'd like to pull out two members at random from a club.

Using RANDOM()

One way is to use random ordering:

club.members.find(:all, :order => 'RANDOM()').limit(2)

However that is different for SqLite (the dev database) and Postgres (production) since in MySql the command is RAND().

While I could start writing some wrappers around this I feel that the fact that it hasn't been done already and doesn't seem to be part of ActiveRecord tells me something and that RANDOM may not be the right way to go.

Pulling items out directly using their index

Another way of doing this is to pull the set in order but then select random records from it:

First off we need to generate a sequence of two unique indices corresponding to the members:

all_indices = 1..club.members.count
two_rand_indices = all_indices.to_a.shuffle.slice(0,2)

This gives an array with two indices guaranteed to be unique and random. We can use these indices to pull out our records

@user1, @user2 = Club.members.values_at(*two_rand_indices)

What's the best method?

While the second method is seems pretty nice, I also feel like I might be missing something and might have over complicated a simple problem. I'm clearly not the first person to have tackled this so what is the best, most SQL efficient route through it?

Upvotes: 2

Views: 280

Answers (3)

Bill Karwin
Bill Karwin

Reputation: 562651

The problem with your first method is that it sorts the whole table by an unindexable expression, just to take two rows. This does not scale well.

The problem with your second method is similar, if you have 109 rows in your table, then you will generate a large array from to_a. That will take a lot of memory and time to shuffle it.

Also by using values_at aren't you assuming that there's a row for every primary key value from 1 to count, with no gaps? You shouldn't assume that.

What I'd recommend instead is:

  1. Count the rows in the table.

    c = Club.members.count
    
  2. Pick two random numbers between 1 and the count.

    r_a = 2.times.map{ 1+Random.rand(c) }
    
  3. Query your table with limit and offset.
    Don't use ORDER BY, just rely on the RDBMS's arbitrary ordering.

    for r in r_a
        row = Club.members.limit(1).offset(r)
    end
    

See also:

Upvotes: 1

binarycode
binarycode

Reputation: 1806

try to use the randumb gem, it implement the second method you mentioned

Upvotes: 0

Donal.Lynch.Msc
Donal.Lynch.Msc

Reputation: 3615

The Order By RAND() function in MySQL:

ORDER BY RAND() LIMIT 4

This will select a random 4 rows when the above is the final clause in the query.

Upvotes: 0

Related Questions