Reputation: 1203
Is there any way to optimize the query below?
SELECT
DATE_FORMAT(a.duedate,'%d-%b-%y') AS dte,
duedate,
SUM(CASE WHEN (typeofnotice='ddat' AND STATUS='open') THEN 1 ELSE 0 END) AS 'DDatOpen',
SUM(CASE WHEN (typeofnotice='ddat' AND STATUS='closed') THEN 1 ELSE 0 END) AS 'DDatClosed',
SUM(CASE WHEN (b.action='tagunchanged' AND STATUS='closed') THEN 1 ELSE 0 END) AS 'DDatUnchanged',
SUM(CASE WHEN (typeofnotice='rss' AND validindicator IS NULL AND STATUS='open') THEN 1 ELSE 0 END) AS 'RSSValidation',
SUM(CASE WHEN (typeofnotice='rss' AND validindicator=1 AND STATUS='open') THEN 1 ELSE 0 END) AS 'RSSValidOpen',
SUM(CASE WHEN (typeofnotice='rss' AND validindicator=1 AND STATUS='closed') THEN 1 ELSE 0 END) AS 'RSSValidClosed',
SUM(CASE WHEN (typeofnotice='rss' AND validindicator=0) THEN 1 ELSE 0 END) AS 'RSSInvalid',
SUM(CASE WHEN (typeofnotice='copernic' AND validindicator IS NULL AND STATUS='open') THEN 1 ELSE 0 END) AS 'CopernicValidation',
SUM(CASE WHEN (typeofnotice='copernic' AND isaward=1 AND validindicator=1 AND STATUS='open') THEN 1 ELSE 0 END) AS 'CopernicValidAwardOpen',
SUM(CASE WHEN (typeofnotice='copernic' AND isaward=1 AND validindicator=1 AND STATUS='closed') THEN 1 ELSE 0 END) AS 'CopernicValidAwardClosed',
SUM(CASE WHEN (typeofnotice='copernic' AND isaward=1 AND validindicator=0) THEN 1 ELSE 0 END) AS 'CopernicInvalidAward',
SUM(CASE WHEN (typeofnotice='copernic' AND isaward=0 AND validindicator=1 AND STATUS='open') THEN 1 ELSE 0 END) AS 'CopernicOpportunityValidOpen',
SUM(CASE WHEN (typeofnotice='copernic' AND isaward=0 AND validindicator=1 AND STATUS='closed') THEN 1 ELSE 0 END) AS 'CopernicOpportunityValidClosed',
SUM(CASE WHEN (typeofnotice='copernic' AND isaward=0 AND validindicator=0) THEN 1 ELSE 0 END) AS 'CopernicOpportunityInvalid',
SUM(CASE WHEN (typeofnotice='copernic' AND STATUS='limited') THEN 1 ELSE 0 END) AS 'CopernicLimitation',
SUM(CASE WHEN ((validindicator IS NULL OR validindicator = 1) AND STATUS='open') THEN 1 ELSE 0 END) AS 'TotalNotices',
SUM(CASE WHEN (validindicator=1 AND STATUS='closed') THEN 1 ELSE 0 END) AS 'TotalCompleted',
SUM(CASE WHEN (validindicator=0 AND (typeofnotice='wget' OR typeofnotice='copernic' OR typeofnotice='rss')) THEN 1 ELSE 0 END) AS 'TotalInvalid'
FROM tblNotices AS a LEFT JOIN tblTransactions AS b
ON a.id = b.noticeid WHERE b.noticeid IS NOT NULL
WHERE duedate >= '2011-04-04 00:00:00' AND a.duedate <= '2012-05-08 24:00:00'
GROUP BY dte
ORDER BY dueDate ASC;
tblTransactions has 1.5 million rows tblNotices has 900k rows.
And the query runs for about 1 min. Is it normal? Is there any way to optimize this query?
I think the DATE_FORMAT
function here really kills the performance.. Is there any tips here? It runs for about 58 seconds.
Upvotes: 0
Views: 1212
Reputation: 48139
The query looks as simple as it could get ... aside from your SUM( IF()) instances.
Your query has a LEFT-JOIN to transactions, but then have where b.noticeID is not null. The combination of the two results in a normal "JOIN" or "INNER JOIN" meaning required on both sides.
As for your Where clause, I would ensure you have a simple index on the due-date. Your table shows a "MUL" (multiple key) index, I would ensure the due date is the first part of that key, or at least ONE index has duedate as the first field.
Next, your group by. Since you are doing a group by the date formatting of the due date, I would just leave the group by based on the duedate (matching the table index) only. The visual formatted string will follow along for the ride anyhow. And since your Order By is also based on the duedate (and not the pretty formatted string version), it should be good to go.
As for the date range of the query itself... how many records are within the range provided.
REVISION
You MIGHT be able to take advantage of having a multipart index of the elements used in your query so the query doesn't have to actually go to the page data to look at the individual elements from the entire record. Since the data would be part of the index key, the query could use that directly and not require going to the pages.
Try an index on
INDEX on ... ( DueDate, status, IsAward, TypeOfNotice, ValidIndicator )
Also, for clarification of alias.fields in your query. Sometimes you explicitly reference the "a." alias and other times, have no alias. For others working on the query after you, or others trying to offer help. It makes it easier to explicitly reference all fields with proper alias to prevent ambiguity of which columns from what table. Yes, you provided your table structure here, but future could make things easier than having to keep looking back at structures.
Upvotes: 1
Reputation: 51868
Well, that
FROM tblNotices AS a LEFT JOIN tblTransactions AS b
ON a.id = b.noticeid WHERE b.noticeid IS NOT NULL
is the same as
FROM tblNotices AS a INNER JOIN tblTransactions AS b
ON a.id = b.noticeid
The second one could be faster.
Why don't you group by duedate? Then you could omit the DATE_FORMAT(), but that's just peanuts, I guess. If you omit it, the ORDER BY
is also not necessary, since it's implicated in the GROUP BY then.
Apart from that there's not much you can do.
Upvotes: 0