Reputation: 3095
I have a table that records transactions. I want to select all of the IDs that have exactly 4 transactions of value 2000, but then exclude those that have a single transaction of 2500.
SELECT t.tuserid, COUNT(*) AS CNT
FROM transactions t
WHERE t.amt = 2000
GROUP BY t.tuserid
HAVING CNT = 4;
That part is simple enough, but I'm not sure how to efficiently exclude anyone who has any transaction with a t.amt = 2500. Would a simple sub query on the where clause be the most efficient?
SELECT t.tuserid, COUNT(*) AS CNT
FROM transactions t
WHERE t.amt = 2000
AND t.tuserid NOT IN (SELECT x.tuserid FROM transactions x WHERE x.amt=2500)
GROUP BY t.tuserid
HAVING CNT = 4;
The transactions table is large and I'm not sure that a sub-query is the most efficient way to run this process.
Upvotes: 4
Views: 76
Reputation: 585
I don't have the resources to test this, but an alternative without a subquery could be to use only SUM
SELECT
t.tuserid,
SUM(CASE WHEN t.amt = 2000 THEN 1 ELSE 0 END CASE) AS CNT1,
SUM(CASE WHEN t.amt = 2500 THEN 1 ELSE 0 END CASE) AS CNT2
FROM
transactions t
GROUP BY
t.tuserid
HAVING
CNT1 = 4 AND CNT2 = 0;
Upvotes: 0
Reputation: 1925
Yes a sub query is fine here. Your subquery does not use any "variables" from the outer query so it's not a correlated subquery. A correlated is usually "executed" for each row of the outer query and are often a performance issue.
Your subquery would only be called one time per outer query call.
In addition it's allways a good idea to think about your indexes, so it would be a good idea to put one index on your used field inside the where clause. Then both queries should be in a complexity class of O(log(n)) which should be really fast even for big data tables.
Upvotes: 3