Ozooha Ozooha
Ozooha Ozooha

Reputation: 83

Optimize group by join MySQl

Below is the query which takes around 76 secs.

Is there a way to optimize this?

FYI I have indexed the id and date fields.

Sorry I should have explained more:

I hope this helps. I apologize for being terse the first time.

SELECT
  o.id,
  o.date,
  IF(o.date > m.min_date,1,0) b
FROM tab o
  JOIN (SELECT DISTINCT
      (id),
      MIN(DATE)    min_date
    FROM tab
    WHERE cond = 1
    GROUP BY id) m
    USING (id)
WHERE o.id = m.id
ORDER BY o.id,o.date

Upvotes: 1

Views: 84

Answers (2)

Arth
Arth

Reputation: 13110

Your original query and not having a unique key on any field called id in a table are strange practice, and it looks like you are after something like:

    SELECT o.id,                        
           o.date,
           m.date IS NULL b
      FROM tab o
 LEFT JOIN (
        SELECT account_id, MIN(date) min_date
          FROM tab
         WHERE cond=1
      GROUP BY account_id
           ) m
        ON m.account_id = o.account_id
       AND m.min_date = o.date
  ORDER BY o.id, o.date

This should be pretty fast. You may do well with a composite index on (account_id, date) for the inner grouped SELECT.

Here are some pointers in case you weren't aware:

  • DISTINCT operates on all columns selected, DISTINCT(id), MIN(date) min_date will give you all the distinct combinations of id and min_date, id by itself may be duplicated and the brackets around it are meaningless. There is also no way to change the behaviour of DISTINCT to work on only a subset of the selected columns.
  • SELECT id, MIN(date) ... GROUP BY account_id will not always return the id corresponding to the minimum date from the account_id group. The id returned can be any undetermined id from the group.. you will need to JOIN the rows to the minimum date or use another method to find the corresponding id.

Mazster's answer, which you have accepted, has a strange GROUP BY and unnecessary IF clause. I have rewritten it below in case this helps you, but am surprised if it runs quicker than the one I have updated above as it runs the subquery for each row. You would do well to examine the EXPLAIN plans for each as I believe you can get it faster using the above query:

  SELECT o.id,
         o.date,
         o.date > (
     SELECT MIN(DATE) min_date
       FROM tab m
      WHERE m.cond = 1 
        AND m.id = o.id
        AND m.account_id = o.account_id
         ) b
    FROM tab o
ORDER BY o.id,o.date

Upvotes: 2

maszter
maszter

Reputation: 3720

I can see there is self join, what about getting rid of that? For instance:

SELECT
    o.id,
    o.date,
    IF(
        (SELECT 
            MIN(DATE) min_date
            FROM tab m
            WHERE cond = 1 AND o.id = m.id
            GROUP BY account_id
        ) < o.date,1,0
    ) b
FROM tab o
ORDER BY o.id,o.date

Upvotes: 1

Related Questions