Reputation: 3849
this query runs about 15 hours in production, I am looking for alternatives to improvements to this,
some improvements those are I think may help are commented in here:
SELECT table1.*
FROM table1
WHERE UPPER(LEFT(table1.cloumn1, 1)) IN ('A', 'B')
AND table1.cloumn2 = 'N' /* add composite index for cloumn2,
column3 */
AND table1.cloumn3 != 'Y'
AND table1.id IN (
SELECT MAX(id)
FROM table1
GROUP BY column5,column6
) /* move this clause to 2nd after
where */
AND table1.column4 IN (
SELECT column1
FROM table2
WHERE column2 IN ('VALUE1', 'VALUE2')
AND (SUBSTRING(column3,6,1) = 'Y'
OR SUBSTRING(column3,25,1) = 'Y')
) /* move this clause to 1st after
where */
AND (table1.column5,table1.column6) NOT IN (
SELECT column1, column2
FROM table3
WHERE table3.column3 IN ('A', 'B')/* add index for this column*/
)
AND DATE_FORMAT(timstampColumn, '%Y/%m/%d') > DATE_ADD(CURRENT_DATE,
INTERVAL - 28 DAY)) /* need index ON this col? */ ;
Any comments/suggestions are appreciated.
Update: with only updating filtering order, Query performance was improved to ~ 28 secs, will update here after adding some indexes and replacing some subqueries to joins
Upvotes: 2
Views: 126
Reputation: 142346
(It might help to see SHOW CREATE TABLE
.)
AND DATE_FORMAT(timstampColumn, '%Y/%m/%d') > DATE_ADD(CURRENT_DATE,
INTERVAL - 28 DAY))
cannot use an index; this might be equivalent:
AND timstampColumn > CURRENT_DATE - INTERVAL 28 DAY
Please provide EXPLAIN
.
What version are you using?
It might (version dependent) help to turn the IN ( SELECT ... )
clauses into 'derived' tables:
JOIN ( SELECT ... ) ON ...
WHERE (x,y) IN ...
is not well optimized. What types of values are they?
With a *_ci collation,
UPPER(LEFT(table1.cloumn1, 1)) IN ('A', 'B')
could be done:
LEFT(table1.cloumn1, 1) IN ('A', 'B')
That won't help performance noticeably. It would be better not to have to break apart columns for testing.
This might use an index involving cloumn1
:
table1.cloumn1 >= 'A'
AND table1.cloumn1 < 'C'
The order of things AND'd together rarely matters. The order in the INDEX
can make a big difference.
Upvotes: 0
Reputation: 21523
Assuming you can add useful indexes (which will help on some of your checks), then maybe try and exclude rows as early as possible.
I suspect you have quite a few rows on table1 for each column5 / column6 combination. If you can get just the latest of each of these (ie, using a sub query that you join) as early as possible then you can exclude most rows from table1 before you need to check any of the non indexed WHERE clauses. You can also exclude some of these by doing a further join against a sub query on table3.
Not tested, but if my assumptions about your database structure are correct then this might be an improvement:-
SELECT table1.*
FROM
(
SELECT MAX(table1.id) AS max_id
FROM table1
INNER JOIN
(
SELECT DISTINCT column1, column2
FROM table3
WHERE table3.column3 IN ('A', 'B')
AND DATE_FORMAT(timstampColumn, '%Y/%m/%d') > DATE_ADD(CURRENT_DATE, INTERVAL - 28 DAY)
) sub0_0
ON table1.column5 = sub0_0.column1
AND table1.column6 = sub0_0.column2
WHERE (table1.cloumn1 LIKE 'A%' OR table1.cloumn1 LIKE 'B%')
AND table1.cloumn2 = 'N'
AND table1.cloumn3 != 'Y'
GROUP BY table1.column5,
table1.column6
) sub0
INNER JOIN table1
ON table1.id = sub0.max_id
INNER JOIN
(
SELECT DISTINCT column1
FROM table2
WHERE column2 IN ('VALUE1', 'VALUE2')
AND (SUBSTRING(column3,6,1) = 'Y'
OR SUBSTRING(column3,25,1) = 'Y')
) sub1
ON table1.column4 = sub1.column1
Upvotes: 1