Reputation: 5556
The following query takes 18 minutes to complete. How can I optimize it to execute faster?
Basically, my query for every citizen joins row from citizens_static
and citizens_dynamic
table where update_id_to
column is highest.
INSERT INTO latest_tmp (...)
SELECT cs1.*, cd1.*
FROM citizens c
JOIN citizens_static cs1 ON c.id = cs1.citizen_id
JOIN citizens_dynamic cd1 ON c.id = cd1.citizen_id
JOIN (
SELECT citizen_id, MAX(update_id_to) AS update_id_to
FROM citizens_static
GROUP BY citizen_id
) AS cs2 ON c.id = cs2.citizen_id AND cs1.update_id_to = cs2.update_id_to
JOIN (
SELECT citizen_id, MAX(update_id_to) AS update_id_to
FROM citizens_dynamic
GROUP BY citizen_id
) cd2 ON c.id = cd2.citizen_id AND cd1.update_id_to = cd2.update_id_to;
latest_tmp
table is MyISAM table with indexes disabled during import. Disabling them improved execution time from 20 minutes to 18 minutes, so it's not the biggest problem.
I also benchmarked LEFT JOIN
approach with WHERE t2.column IS NULL
. It takes several hours comparing to INNER JOIN
approach which I'm using.
Explain query output below. It seems to be using indexes.
citizens_dynamic
and citizens_static
have primary key on citizen_id,update_id_to
and secondary key named "id" on update_id_to,citizen_id
columns.
Upvotes: 0
Views: 111
Reputation: 6002
I'm not familiar enough with MySQL to be able to predict if this will run any better, but I would suggest to give this a try:
SELECT cs1.*, cd1.*
FROM citizens c
JOIN citizens_static cs1 ON c.id = cs1.citizen_id
AND NOT EXISTS ( SELECT *
FROM citizens_static cs2
WHERE cs2.citizen_id = cs1.citizen_id
AND cs2.update_id > cs1.update_id )
JOIN citizens_dynamic cd1 ON c.id = cd1.citizen_id
AND NOT EXISTS ( SELECT *
FROM citizens_dynamic cd2
WHERE cd2.citizen_id = cd1.citizen_id
AND cd2.update_id > cd1.update_id )
PS: Please comment the running time (if it returns within the hour =), that way I might learn (not) to propose this construction in the future again.
Upvotes: 0
Reputation: 142298
Could you explain, in English, what you want?
Then see Groupwise Max And edit the following as needed:
SELECT
province, n, city, population
FROM
( SELECT @prev := '', @n := 0 ) init
JOIN
( SELECT @n := if(province != @prev, 1, @n + 1) AS n,
@prev := province,
province, city, population
FROM Canada
ORDER BY
province,
population DESC
) x
WHERE n <= 3
ORDER BY province, n;
Regardless of the ASC/DESC on the inner ORDER BY, there will be a full table scan and a 'filesort'.
Upvotes: 1