Reputation: 2398
I need to select say 2000000 records at random from a very large database. I looked at previous questions. So please do not mark this question as duplicate. I need clarification. Most answers suggest using ORDER BY RAND()
function. So my query will be:
SELECT DISTINCT no
FROM table
WHERE name != "null"
ORDER BY RAND()
LIMIT 2000000;
I want each record to be selected at random. I am not sure if I understand the ORDER BY RAND()
effect here. But I am afraid it will select a random record, say 3498 and will continue selection from there, say, the next records will be: 3499, 3500, 3501, etc.
I want each recor to be random, not to start the order from a random record.
How can I select 2000000 random record where each record is selected at random? Can you simplify what exactly ORDER BY RAND()
does?
Note that I use Google BigQuery so the performance issue should not be a big problem here. I just want to achieve the requirement of selecting random 2000000 records.
Upvotes: 4
Views: 7310
Reputation: 33705
SELECT x
FROM T
ORDER BY RAND()
is equivalent to
SELECT x
FROM (
SELECT x, RAND() AS r
FROM T
)
ORDER BY r
The query generates a random value for each row, then uses that random value to order the rows. If you include a limit:
SELECT x
FROM T
ORDER BY RAND()
LIMIT 10
This randomly selects 10 rows from the table.
Upvotes: 8