Reputation: 32124
I'm using postgresql 12.1 to perform that task.
I have a table called numbers
with a relevant column called number_a
, this table doesn't not have an auto incremented primary key.
I need to fetch random numbers from it, the problem is that this table is pretty huge, could be close to 1b of rows. So even if I select one row by random it takes a long time.
Executing select number_1 from numbers order by random() limit 1
is really really slow
I googled and found some websites that explain that you can you the increment ID, to to use that to find random keys, but I don't have it in this table and I can't modify it.
I googled some more and found some postgresql extensions, but I prefer not to install a 3rd party extension just for this kind of a solution.
So does anyhow know how to properly select by rand when I don't have an incremental key?
I can't change the table.
Upvotes: 0
Views: 142
Reputation: 32124
Well I wanted just to give more information about the correct answer.
Postgresql supports tablesample
which provides random result set. For my needs it's random enough. I checked and each time I do get a different result set. That's good enough for me.
Now, tablesample
supports SYSTEM and BERNOULLI sampling methods that are related more to percentage and less to actual rows.
There is an extension that comes with postgres (not 3rd party or anything) that supports specific rows numbers https://www.postgresql.org/docs/current/tsm-system-rows.html
So I can easily do:
CREATE EXTENSION tsm_system_rows;
select number_a from numbers tablesample system_rows(4);
And that will run. Very very fast:)
Upvotes: 1
Reputation: 32898
From a comment:
not sure if it's "random enough", but did you try
tablesample
? – a_horse_with_no_name
Upvotes: 2