Reputation: 2457
I have an SQL query with a nested join:
SELECT rh.host, rh.report, COUNT(results.id), COUNT(results_2.id), COUNT(results_3.id), COUNT(results_4.id)
FROM report_hosts rh
INNER JOIN report_results rr ON rh.report = rr.report
LEFT OUTER JOIN results ON rr.result = results.id AND results.type = 'Hole' AND results.host = rh.host
LEFT OUTER JOIN results results_2 ON rr.result = results_2.id AND results_2.type = 'Warning' AND results_2.host = rh.host
LEFT OUTER JOIN results results_3 ON rr.result = results_3.id AND results_3.type = 'Note' AND results_3.host = rh.host
LEFT OUTER JOIN results results_4 ON rr.result = results_4.id AND results_4.type = 'Log' AND results_4.host = rh.host
GROUP BY rh.host
The query as-is takes about 5sec with 99.7% copying to temp table. An EXPLAIN
of the full query shows as:
+----+-------------+-----------+--------+---------------+---------+---------+-------------------+------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-----------+--------+---------------+---------+---------+-------------------+------+---------------------------------+
| 1 | SIMPLE | rr | ALL | report | NULL | NULL | NULL | 3139 | Using temporary; Using filesort |
| 1 | SIMPLE | rh | ref | report | report | 5 | openvas.rr.report | 167 | Using where |
| 1 | SIMPLE | results | eq_ref | PRIMARY,type | PRIMARY | 4 | openvas.rr.result | 1 | |
| 1 | SIMPLE | results_2 | eq_ref | PRIMARY,type | PRIMARY | 4 | openvas.rr.result | 1 | |
| 1 | SIMPLE | results_3 | eq_ref | PRIMARY,type | PRIMARY | 4 | openvas.rr.result | 1 | |
| 1 | SIMPLE | results_4 | eq_ref | PRIMARY,type | PRIMARY | 4 | openvas.rr.result | 1 | |
+----+-------------+-----------+--------+---------------+---------+---------+-------------------+------+---------------------------------+
When I remove the LEFT JOIN
s, the query executes in about 1s, each LEFT JOIN
adds about one additional second execution time.
My question:
Can anyone explain, why the copy to temp table task of one join takes longer if there are more LEFT JOIN
s? Is MySQL copying the temp table several times for each JOIN?
How can I avoid this? Am I missing an index?
What I intend to accomplish: I have a table with scanning results of several hosts. Each result is classified in types ( "Hole", "Warning", "Note" or "Log"). I want to select each host and the corresponding amount of Holes, Warnings, Notes and Logs. As a "restriction" I have the fact, that not each host has each type of results.
Upvotes: 2
Views: 160
Reputation: 6872
You're joining a single table several times, which effectively is like joining multiple tables. You should be able to handle that with some case statements and a where clause instead. (In fact you may not need the where clause.)
SELECT rh.host, rh.report,
COUNT(CASE WHEN results.type = 'Hole' THEN 1 ELSE NULL END) as Holes,
COUNT(CASE WHEN results.type = 'Warning' THEN 1 ELSE NULL END) as Warnings,
COUNT(CASE WHEN results.type = 'Note' THEN 1 ELSE NULL END) as Notes,
COUNT(CASE WHEN results.type = 'Log' THEN 1 ELSE NULL END) as Logs
FROM
report_hosts rh
INNER JOIN
report_results rr
ON
rh.report = rr.report
LEFT OUTER JOIN
results
ON
rr.result = results.id
AND results.host = rh.host
WHERE
results.type = 'Hole'
OR results.type = 'Warning'
OR results.type = 'Note'
OR results.type = 'Log'
GROUP BY rh.host, rh.report
Case statements, IME, are not the greatest performers, but your data bloat from the many joins may offset that and give this better performance.
Upvotes: 3
Reputation: 16107
Using a lot of data (in your case an extra left join
) will mean storing it in memory.
If you deplete the buffers your query will need to be stored to a temporary result table on drive.
Try using the same number of left join
s but limiting the number of rows with a limit
. It should confirm that the problem lies in the buffers (meaning it will run faster).
Upvotes: 1