Reputation: 367
I want to extract a roughly 5 million row sample from a table that will contain somewhere between 10 million and 20 million rows.
Due to the large number of rows, efficiency is key. As such, I am trying to avoid sorting the rows where possible, hence why I am avoiding the dbms_random.value solution that I have seen in similar questions.
I tried to do something like the following:
SELECT *
FROM full_table
SAMPLE (CEIL(100 * 5000000 / (SELECT COUNT(*) FROM full_table)));
However, I can't seem to do arithmetic in the SAMPLE clause (ORA-00933: SQL command not properly ended - I tried this with a simple SAMPLE(10/2) and still get the same thing).
Is this a reasonable approach, and, if so, how do I calculate the number of rows in the sample clause?
Upvotes: 2
Views: 745
Reputation: 132580
You could use PL/SQL with dynamic SQL like this:
declare
cnt integer;
begin
select count(*) into cnt from full_table;
dbms_output.put_line(cnt);
execute immediate
'insert into target_table'
||' select * from full_tablesample (' || ceil(100 * 5000000/cnt) || ')';
end;
Upvotes: 4
Reputation: 1269853
If you need roughly 5 million rows, you can do:
SELECT ft.*
FROM full_table ft cross join
(SELECT COUNT(*) as cnt FROM full_table) x
WHERE dbms_random.value * cnt < 5000000;
This uses simple arithmetic to determine which rows go into the table. The result should be pretty close to 5,000,000.
Upvotes: 2