Freddie Fabregas
Freddie Fabregas

Reputation: 1203

MySQL query optimization sum.. case

Is there any way to optimize the query below?

SELECT 
DATE_FORMAT(a.duedate,'%d-%b-%y') AS dte, 
duedate, 
SUM(CASE WHEN (typeofnotice='ddat' AND STATUS='open') THEN 1 ELSE 0 END) AS 'DDatOpen', 
SUM(CASE WHEN (typeofnotice='ddat' AND STATUS='closed') THEN 1 ELSE 0 END) AS 'DDatClosed', 
SUM(CASE WHEN (b.action='tagunchanged' AND STATUS='closed') THEN 1 ELSE 0 END) AS 'DDatUnchanged',
SUM(CASE WHEN (typeofnotice='rss' AND validindicator IS NULL AND STATUS='open') THEN 1 ELSE 0 END) AS 'RSSValidation', 
SUM(CASE WHEN (typeofnotice='rss' AND validindicator=1 AND STATUS='open') THEN 1 ELSE 0 END) AS 'RSSValidOpen', 
SUM(CASE WHEN (typeofnotice='rss' AND validindicator=1 AND STATUS='closed') THEN 1 ELSE 0 END) AS 'RSSValidClosed', 
SUM(CASE WHEN (typeofnotice='rss' AND validindicator=0) THEN 1 ELSE 0 END) AS 'RSSInvalid', 
SUM(CASE WHEN (typeofnotice='copernic' AND validindicator IS NULL AND STATUS='open') THEN 1 ELSE 0 END) AS 'CopernicValidation', 
SUM(CASE WHEN (typeofnotice='copernic' AND isaward=1 AND validindicator=1 AND STATUS='open') THEN 1 ELSE 0 END) AS 'CopernicValidAwardOpen', 
SUM(CASE WHEN (typeofnotice='copernic' AND isaward=1 AND validindicator=1 AND STATUS='closed') THEN 1 ELSE 0 END) AS 'CopernicValidAwardClosed', 
SUM(CASE WHEN (typeofnotice='copernic' AND isaward=1 AND validindicator=0) THEN 1 ELSE 0 END) AS 'CopernicInvalidAward', 
SUM(CASE WHEN (typeofnotice='copernic' AND isaward=0 AND validindicator=1 AND STATUS='open') THEN 1 ELSE 0 END) AS 'CopernicOpportunityValidOpen', 
SUM(CASE WHEN (typeofnotice='copernic' AND isaward=0 AND validindicator=1 AND STATUS='closed') THEN 1 ELSE 0 END) AS 'CopernicOpportunityValidClosed', 
SUM(CASE WHEN (typeofnotice='copernic' AND isaward=0 AND validindicator=0) THEN 1 ELSE 0 END) AS 'CopernicOpportunityInvalid', 
SUM(CASE WHEN (typeofnotice='copernic' AND STATUS='limited') THEN 1 ELSE 0 END) AS 'CopernicLimitation', 
SUM(CASE WHEN ((validindicator IS NULL OR validindicator = 1) AND STATUS='open') THEN 1 ELSE 0 END) AS 'TotalNotices', 
SUM(CASE WHEN (validindicator=1 AND STATUS='closed') THEN 1 ELSE 0 END) AS 'TotalCompleted', 
SUM(CASE WHEN (validindicator=0 AND (typeofnotice='wget' OR typeofnotice='copernic' OR typeofnotice='rss')) THEN 1 ELSE 0 END) AS 'TotalInvalid'
FROM tblNotices AS a LEFT JOIN tblTransactions AS b 
ON a.id = b.noticeid WHERE b.noticeid IS NOT NULL
WHERE duedate >= '2011-04-04 00:00:00'  AND  a.duedate <= '2012-05-08 24:00:00' 
GROUP BY dte 
ORDER BY dueDate ASC;

tblTransactions has 1.5 million rows tblNotices has 900k rows.

And the query runs for about 1 min. Is it normal? Is there any way to optimize this query?

Describe 'table' and Explain Select

I think the DATE_FORMAT function here really kills the performance.. Is there any tips here? It runs for about 58 seconds.

Upvotes: 0

Views: 1212

Answers (2)

DRapp
DRapp

Reputation: 48139

The query looks as simple as it could get ... aside from your SUM( IF()) instances.

Your query has a LEFT-JOIN to transactions, but then have where b.noticeID is not null. The combination of the two results in a normal "JOIN" or "INNER JOIN" meaning required on both sides.

As for your Where clause, I would ensure you have a simple index on the due-date. Your table shows a "MUL" (multiple key) index, I would ensure the due date is the first part of that key, or at least ONE index has duedate as the first field.

Next, your group by. Since you are doing a group by the date formatting of the due date, I would just leave the group by based on the duedate (matching the table index) only. The visual formatted string will follow along for the ride anyhow. And since your Order By is also based on the duedate (and not the pretty formatted string version), it should be good to go.

As for the date range of the query itself... how many records are within the range provided.

REVISION

You MIGHT be able to take advantage of having a multipart index of the elements used in your query so the query doesn't have to actually go to the page data to look at the individual elements from the entire record. Since the data would be part of the index key, the query could use that directly and not require going to the pages.

Try an index on

INDEX on ... ( DueDate, status, IsAward, TypeOfNotice, ValidIndicator )

Also, for clarification of alias.fields in your query. Sometimes you explicitly reference the "a." alias and other times, have no alias. For others working on the query after you, or others trying to offer help. It makes it easier to explicitly reference all fields with proper alias to prevent ambiguity of which columns from what table. Yes, you provided your table structure here, but future could make things easier than having to keep looking back at structures.

Upvotes: 1

fancyPants
fancyPants

Reputation: 51868

Well, that

FROM tblNotices AS a LEFT JOIN tblTransactions AS b 
ON a.id = b.noticeid WHERE b.noticeid IS NOT NULL

is the same as

FROM tblNotices AS a INNER JOIN tblTransactions AS b 
ON a.id = b.noticeid

The second one could be faster.

Why don't you group by duedate? Then you could omit the DATE_FORMAT(), but that's just peanuts, I guess. If you omit it, the ORDER BY is also not necessary, since it's implicated in the GROUP BY then.

Apart from that there's not much you can do.

Upvotes: 0

Related Questions