Reputation: 83
Below is the query which takes around 76 secs.
Is there a way to optimize this?
FYI I have indexed the id and date fields.
Sorry I should have explained more:
I have a bunch of ids, dates and conditions (=which are either 1s or 0s).
I collect the ids and dates where the conditions are 1 and this resulting group say Group1 is then queried to select the minimum date.
This minimum date is then used to flag the accounts in the Group1 to 1 if their dates are greater than the minimum dates.
I hope this helps. I apologize for being terse the first time.
SELECT
o.id,
o.date,
IF(o.date > m.min_date,1,0) b
FROM tab o
JOIN (SELECT DISTINCT
(id),
MIN(DATE) min_date
FROM tab
WHERE cond = 1
GROUP BY id) m
USING (id)
WHERE o.id = m.id
ORDER BY o.id,o.date
Upvotes: 1
Views: 84
Reputation: 13110
Your original query and not having a unique key on any field called id
in a table are strange practice, and it looks like you are after something like:
SELECT o.id,
o.date,
m.date IS NULL b
FROM tab o
LEFT JOIN (
SELECT account_id, MIN(date) min_date
FROM tab
WHERE cond=1
GROUP BY account_id
) m
ON m.account_id = o.account_id
AND m.min_date = o.date
ORDER BY o.id, o.date
This should be pretty fast. You may do well with a composite index on (account_id, date)
for the inner grouped SELECT
.
Here are some pointers in case you weren't aware:
DISTINCT
operates on all columns selected, DISTINCT(id), MIN(date) min_date
will give you all the distinct combinations of id and min_date, id by itself may be duplicated and the brackets around it are meaningless. There is also no way to change the behaviour of DISTINCT
to work on only a subset of the selected columns.SELECT id, MIN(date) ... GROUP BY account_id
will not always return the id corresponding to the minimum date from the account_id group. The id returned can be any undetermined id from the group.. you will need to JOIN
the rows to the minimum date or use another method to find the corresponding id.Mazster's answer, which you have accepted, has a strange GROUP BY
and unnecessary IF
clause. I have rewritten it below in case this helps you, but am surprised if it runs quicker than the one I have updated above as it runs the subquery for each row. You would do well to examine the EXPLAIN
plans for each as I believe you can get it faster using the above query:
SELECT o.id,
o.date,
o.date > (
SELECT MIN(DATE) min_date
FROM tab m
WHERE m.cond = 1
AND m.id = o.id
AND m.account_id = o.account_id
) b
FROM tab o
ORDER BY o.id,o.date
Upvotes: 2
Reputation: 3720
I can see there is self join, what about getting rid of that? For instance:
SELECT
o.id,
o.date,
IF(
(SELECT
MIN(DATE) min_date
FROM tab m
WHERE cond = 1 AND o.id = m.id
GROUP BY account_id
) < o.date,1,0
) b
FROM tab o
ORDER BY o.id,o.date
Upvotes: 1