Reputation: 1503
I've a Python batch job that executes a SQL query based on supplied ids.
The number of ids is usually above 20k.
The query is:
SELECT MY_ID, COL_A, COL_B FROM SOME_TABLE WHERE ID IN {dynamically_set_ids}
Because Oracle limits the number of values in IN
clause to 1000, I updated the query to:
WITH ids_list AS
(select id1 AS my_id FROM dual
UNION
select id2 AS my_id FROM dual ...) -- all 20k+ ids added like this
SELECT col_a,
col_b,
some_table.my_id
FROM some_table
join ids_list
ON ids_list.my_id = some_table.my_id
The above query executes in around 33 mins.
But when I loop the list of ids and execute the query in a loop:
con = get_con()
for i in range(calculated_iterations):
con.execute("select my_id, col_a, col_b where my_id in {0}".format(1k_or_less_ids))
# collect data
For current amount of data, the loop above returns expected data in around 40 secs only.
So, the question is: Is it suggested/ok to execute the select in a loop ?
The performance gain doesn't matter much as the job runs overnight.
But, for such a gain, would a loop be preferred ?
Note that amount of data cannot be guessed. It usually runs in millions of rows.
Upvotes: 0
Views: 45
Reputation: 3872
20K 'union select all ..' statements? Regardless of run-time performance, that is coding madness!
The 'textbook' solution is to put those 20k values in a text file, define an external table over it, then
SELECT MY_ID,
COL_A,
COL_B
FROM SOME_TABLE WHERE ID IN (select ext_id
from my_external_table)
;
Upvotes: 2