koichirose
koichirose

Reputation: 1825

Slow query with HAVING clause - can I speed it up?

I have the following query which produces the expected results but is very slow (it takes about 10 seconds. The gstats table has about 130k rows in my development environment and is much bigger in production):

SELECT count(d.id) AS dcount, s.id, s.name
FROM sites s
LEFT JOIN deals d ON (s.id = d.site_id AND d.is_active = 1)
WHERE (s.is_active = 1)
AND s.id IN(
    SELECT g.site_id
    FROM gstats g
    WHERE g.start_date > '2015-04-30'
    GROUP BY g.site_id
    HAVING SUM(g.results) > 100
)
GROUP BY s.id
ORDER BY dcount ASC

Am I doing something wrong? How can I speed this up?

Would adding indexes/using a view help?

Upvotes: 2

Views: 5430

Answers (3)

willeM_ Van Onsem
willeM_ Van Onsem

Reputation: 477200

A quick fix would be to filter in the subquery:

SELECT count(d.id) AS dcount, s.id, s.name
FROM sites s
LEFT JOIN deals d ON (s.id = d.site_id AND d.is_active = 1)
WHERE (s.is_active = 1)
AND s.id IN(
    SELECT g.site_id
    FROM gstats g
    WHERE g.start_date > '2015-04-30' AND g.site_id = s.id
    GROUP BY g.site_id
    HAVING SUM(g.results) > 100
)
GROUP BY s.id
ORDER BY dcount ASC

Since otherwise, you do such grouping query for every possible candidate. We can make this more elegant with EXISTS:

SELECT count(d.id) AS dcount, s.id, s.name
FROM sites s
LEFT JOIN deals d ON (s.id = d.site_id AND d.is_active = 1)
WHERE (s.is_active = 1)
AND EXISTS (
    SELECT 1
    FROM gstats g
    WHERE g.site_id = s.id AND g.start_date > '2015-04-30'
    HAVING SUM(g.results) > 100
)
GROUP BY s.id
ORDER BY dcount ASC

But we are not done yet, now we will use the EXISTS for every element. That's weird since the query only depends on s.id, so it only depends on the group, not the individual rows. So a potential speedup, but this depends on the sizes of the tables, etc. is to move the condition to a HAVING statement:

SELECT count(d.id) AS dcount, s.id, s.name
FROM sites s
LEFT JOIN deals d ON (s.id = d.site_id AND d.is_active = 1)
WHERE (s.is_active = 1)
GROUP BY s.id
ORDER BY dcount ASC
HAVING EXISTS (
    SELECT 1
    FROM gstats g
    WHERE g.site_id = s.id AND g.start_date > '2015-04-30'
    HAVING SUM(g.results) > 100
)

Upvotes: 4

Thorsten Kettner
Thorsten Kettner

Reputation: 95052

The query looks fine as is. I suggest the following indexes:

create index idx_gstats on gstats(start_date, results, site_id);
create index idx_deals1 on deals(is_active, site_id);
create index idx_deals2 on deals(site_id, is_active);

Then see the execution plan for the query and remove the deals index that doesn't get used.

Upvotes: 0

Gordon Linoff
Gordon Linoff

Reputation: 1270463

Try moving the subquery to the FROM clause:

SELECT count(d.id) AS dcount, s.id, s.name
FROM sites s JOIN
     (SELECT g.site_id
      FROM gstats g
      WHERE g.start_date > '2015-04-30'
      GROUP BY g.site_id
      HAVING SUM(g.results) > 100
     ) g
     ON g.site_id = s.site_id LEFT JOIN
     deals d
     ON s.id = d.site_id AND d.is_active = 1
WHERE s.is_active = 1
GROUP BY s.id
ORDER BY dcount ASC;

I assume you have indexes on the join columns. You might also find that this helps performance:

SELECT s.id, s.name,
       (SELECT COUNT(*)
        FROM deals d
        WHERE d.site_id = s.id AND d.is_active = 1
       ) as dcount
FROM sites s JOIN
     (SELECT g.site_id
      FROM gstats g
      WHERE g.start_date > '2015-04-30'
      GROUP BY g.site_id
      HAVING SUM(g.results) > 100
     ) g
     ON g.site_id = s.site_id 
WHERE s.is_active = 1
ORDER BY dcount ASC;

For this version, you want an index on deals(site_id, is_active).

Upvotes: 1

Related Questions