Reputation: 1825
I have the following query which produces the expected results but is very slow (it takes about 10 seconds. The gstats table has about 130k rows in my development environment and is much bigger in production):
SELECT count(d.id) AS dcount, s.id, s.name
FROM sites s
LEFT JOIN deals d ON (s.id = d.site_id AND d.is_active = 1)
WHERE (s.is_active = 1)
AND s.id IN(
SELECT g.site_id
FROM gstats g
WHERE g.start_date > '2015-04-30'
GROUP BY g.site_id
HAVING SUM(g.results) > 100
)
GROUP BY s.id
ORDER BY dcount ASC
Am I doing something wrong? How can I speed this up?
Would adding indexes/using a view help?
Upvotes: 2
Views: 5430
Reputation: 477200
A quick fix would be to filter in the subquery:
SELECT count(d.id) AS dcount, s.id, s.name
FROM sites s
LEFT JOIN deals d ON (s.id = d.site_id AND d.is_active = 1)
WHERE (s.is_active = 1)
AND s.id IN(
SELECT g.site_id
FROM gstats g
WHERE g.start_date > '2015-04-30' AND g.site_id = s.id
GROUP BY g.site_id
HAVING SUM(g.results) > 100
)
GROUP BY s.id
ORDER BY dcount ASC
Since otherwise, you do such grouping query for every possible candidate. We can make this more elegant with EXISTS
:
SELECT count(d.id) AS dcount, s.id, s.name
FROM sites s
LEFT JOIN deals d ON (s.id = d.site_id AND d.is_active = 1)
WHERE (s.is_active = 1)
AND EXISTS (
SELECT 1
FROM gstats g
WHERE g.site_id = s.id AND g.start_date > '2015-04-30'
HAVING SUM(g.results) > 100
)
GROUP BY s.id
ORDER BY dcount ASC
But we are not done yet, now we will use the EXISTS
for every element. That's weird since the query only depends on s.id
, so it only depends on the group, not the individual rows. So a potential speedup, but this depends on the sizes of the tables, etc. is to move the condition to a HAVING
statement:
SELECT count(d.id) AS dcount, s.id, s.name
FROM sites s
LEFT JOIN deals d ON (s.id = d.site_id AND d.is_active = 1)
WHERE (s.is_active = 1)
GROUP BY s.id
ORDER BY dcount ASC
HAVING EXISTS (
SELECT 1
FROM gstats g
WHERE g.site_id = s.id AND g.start_date > '2015-04-30'
HAVING SUM(g.results) > 100
)
Upvotes: 4
Reputation: 95052
The query looks fine as is. I suggest the following indexes:
create index idx_gstats on gstats(start_date, results, site_id);
create index idx_deals1 on deals(is_active, site_id);
create index idx_deals2 on deals(site_id, is_active);
Then see the execution plan for the query and remove the deals index that doesn't get used.
Upvotes: 0
Reputation: 1270463
Try moving the subquery to the FROM
clause:
SELECT count(d.id) AS dcount, s.id, s.name
FROM sites s JOIN
(SELECT g.site_id
FROM gstats g
WHERE g.start_date > '2015-04-30'
GROUP BY g.site_id
HAVING SUM(g.results) > 100
) g
ON g.site_id = s.site_id LEFT JOIN
deals d
ON s.id = d.site_id AND d.is_active = 1
WHERE s.is_active = 1
GROUP BY s.id
ORDER BY dcount ASC;
I assume you have indexes on the join
columns. You might also find that this helps performance:
SELECT s.id, s.name,
(SELECT COUNT(*)
FROM deals d
WHERE d.site_id = s.id AND d.is_active = 1
) as dcount
FROM sites s JOIN
(SELECT g.site_id
FROM gstats g
WHERE g.start_date > '2015-04-30'
GROUP BY g.site_id
HAVING SUM(g.results) > 100
) g
ON g.site_id = s.site_id
WHERE s.is_active = 1
ORDER BY dcount ASC;
For this version, you want an index on deals(site_id, is_active)
.
Upvotes: 1