stil
stil

Reputation: 5556

Generic greatest N per group query is too slow

The following query takes 18 minutes to complete. How can I optimize it to execute faster?

Basically, my query for every citizen joins row from citizens_static and citizens_dynamic table where update_id_to column is highest.

INSERT INTO latest_tmp (...)

SELECT cs1.*, cd1.*
FROM citizens c

JOIN citizens_static  cs1 ON c.id = cs1.citizen_id
JOIN citizens_dynamic cd1 ON c.id = cd1.citizen_id

JOIN (
    SELECT citizen_id, MAX(update_id_to) AS update_id_to
    FROM citizens_static
    GROUP BY citizen_id
) AS cs2 ON c.id = cs2.citizen_id AND cs1.update_id_to = cs2.update_id_to

JOIN (
    SELECT citizen_id, MAX(update_id_to) AS update_id_to
    FROM citizens_dynamic
    GROUP BY citizen_id
) cd2 ON c.id = cd2.citizen_id AND cd1.update_id_to = cd2.update_id_to;

latest_tmp table is MyISAM table with indexes disabled during import. Disabling them improved execution time from 20 minutes to 18 minutes, so it's not the biggest problem.

I also benchmarked LEFT JOIN approach with WHERE t2.column IS NULL. It takes several hours comparing to INNER JOIN approach which I'm using.

Explain query output below. It seems to be using indexes. explain

citizens_dynamic and citizens_static have primary key on citizen_id,update_id_to and secondary key named "id" on update_id_to,citizen_id columns.

Upvotes: 0

Views: 111

Answers (2)

deroby
deroby

Reputation: 6002

I'm not familiar enough with MySQL to be able to predict if this will run any better, but I would suggest to give this a try:

SELECT cs1.*, cd1.*
FROM citizens c

JOIN citizens_static  cs1 ON c.id = cs1.citizen_id
AND NOT EXISTS ( SELECT *
                   FROM citizens_static cs2
                  WHERE cs2.citizen_id = cs1.citizen_id
                    AND cs2.update_id > cs1.update_id )

JOIN citizens_dynamic cd1 ON c.id = cd1.citizen_id
AND NOT EXISTS ( SELECT *
                   FROM citizens_dynamic cd2
                  WHERE cd2.citizen_id = cd1.citizen_id
                    AND cd2.update_id > cd1.update_id )

PS: Please comment the running time (if it returns within the hour =), that way I might learn (not) to propose this construction in the future again.

Upvotes: 0

Rick James
Rick James

Reputation: 142298

Could you explain, in English, what you want?

Then see Groupwise Max And edit the following as needed:

SELECT
        province, n, city, population
    FROM
      ( SELECT  @prev := '', @n := 0 ) init
    JOIN
      ( SELECT  @n := if(province != @prev, 1, @n + 1) AS n,
                @prev := province,
                province, city, population
            FROM  Canada
            ORDER BY
                province,
                population DESC
      ) x
    WHERE  n <= 3
    ORDER BY  province, n;

Regardless of the ASC/DESC on the inner ORDER BY, there will be a full table scan and a 'filesort'.

Upvotes: 1

Related Questions