Reputation: 121
I have a series of queries against a very mega large database, and I have hundreds-of-thousands of ORs in WHERE clauses. What is the best and easiest way to optimize such SQL queries? I found some articles about creating temporary tables and using joins, but I am unsure. I'm new to serious SQL, and have been cutting and pasting results from one into the next.
SELECT doc_id, language, author, title FROM doc_text WHERE language='fr' OR language='es'
SELECT doc_id, ref_id FROM doc_ref WHERE doc_id=1234567 OR doc_id=1234570 OR doc_id=1234572 OR doc_id=1234596 OR OR OR ...
SELECT ref_id, location_id FROM ref_master WHERE ref_id=098765 OR ref_id=987654 OR ref_id=876543 OR OR OR ...
SELECT location_id, location_display_name FROM location
SELECT doc_id, index_code, FROM doc_index WHERE doc_id=1234567 OR doc_id=1234570 OR doc_id=1234572 OR doc_id=1234596 OR OR OR x100,000
These unoptimized query can take over 24 hours each. Cheers.
Upvotes: 3
Views: 4187
Reputation: 103617
I think your real problem is that you are not JOINing tables.
this is a guess, but I'll bet that you run a query and then get all the IDs in your application and then run another query WHERE all the rows match from the previous query. You would greatly improve performance by writing a query with a join:
SELECT
*
FROM YourTableA a
INNER JOIN YourTableB b ON a.ID=b.ID
WHERE a. .....
then process the single result set in your application.
Upvotes: 0
Reputation: 700472
Instead of having a lot of conditions on the same field, you can use the in
keyword:
SELECT doc_id, ref_id FROM doc_ref WHERE doc_id in (1234567, 1234570, 1234572, 1234596, ...)
This will make the queries shorter, but it's not certain that the performance will differ much. You should make sure that you have indexes on the relevant fields, that usually makes a huge difference for the performance.
However, it seems that the reason that you have a lot of values to compare is that you are using the result from one query to create the next. This should of course be solved with a join instead of a dynamic query:
select
doc_text.doc_id, doc_text.language, doc_text.author, doc_text.title,
doc_ref.ref_id, ref_master.location_id, location.location_display_name,
doc_index.doc_id, doc_index.display_heading
from DOC_TEXT
inner join DOC_REF on doc_text.doc_id = doc_ref.doc_id
inner join REF_MASTER on doc_ref.doc_id = ref_master.ref_id
inner join LOCATION on ref_master.location_id = location.location_id
inner join DOC_INDEX on doc_text.doc_id = doc_index.doc_id
where
doc_text.language in ('fr', 'es')
Upvotes: 3
Reputation: 338278
The easiest way to get that done is this:
language
, ref_id
, doc_id
, etc), at least double check their existence. Make them clustered if they are the primary index of the table.So...
SELECT doc_id, language, author, title
FROM doc_text
WHERE language='fr' OR language='es'
becomes
INSERT language_search (language) VALUES ('fr')
INSERT language_search (language) VALUES ('es')
/* and 50 more */
SELECT dt.doc_id, dt.language, dt.author, dt.title
FROM doc_text dt
INNER JOIN language_search ls ON dt.language = ls.language
Upvotes: 5
Reputation: 121
I think I just answered my own question... NESTED TABLES!
SELECT doc_text.doc_id, doc_text.language, doc_text.author, doc_text.title, doc_ref.ref_id, ref_master.location_id, location.location_display_name, doc_index.doc_id, doc_index.display_heading
FROM DOC_TEXT, DOC_REF, REF_MASTER, LOCATION, DOC_INDEX
WHERE
doc_text.language='fr' OR doc_text.language='es'
AND
doc_text.doc_id=doc_ref.doc_id
AND
doc_ref.doc_id=ref_master.ref_id
AND
ref_master.location_id=location.location_id
AND
doc_text.doc_id=doc_index.doc_id
Upvotes: 7