Reputation: 1725
I was wondering how to get random rows within a SQL query since the full query has over 10 Billion rows and would explode our servers.
How can I query a subset which is sampled in this query structure?
SELECT
a,b,c
FROM test
WHERE
test.a= 123
AND test.b ILIKE '10008383825311900000'
LIMIT 1000000
Upvotes: 0
Views: 72
Reputation: 1271051
The canonical answer is to sort and use limit
:
select t.*
from t
order by rand()
limit 100;
But do not do this! Instead, use rand()
in a where
clause. For a 1% sample:
select t.*
from t
where rand() < 0.01;
Random sampling methods in MySQL tend to require scanning the entire table, which is going to be expensive in your case.
EDIT:
To optimize your query, I would start by using =
rather than ILIKE
:
SELECT a, b, c
FROM test
WHERE test.a = 123 AND
test.b = '10008383825311900000'
LIMIT 1000000;
You want an index on test(a, b, c)
.
Upvotes: 3
Reputation: 13026
Here's another answer.
select * from (
select
a,b,c
,row_number() over (order by a) as rn
from test
where
t1.a= 123
AND t1.b ILIKE '10008383825311900000'
) t1
inner join
(select floor(rand()*100) as rn from test limit 1000000) t2 on t2.rn = t1.rn
Upvotes: 0