Reputation: 1065
I originally had written the following
SELECT t1.TransactionNumber
FROM t1
JOIN
(
SELECT MAX(id) id
FROM t1
WHERE Period BETWEEN '01-11-2013' and '01-12-2014'
GROUP BY AccountNumber
) t2
on t1.id= t2.id
But it was too slow. It took around 20 seconds, so as a test, I changed it to the following
SELECT MAX(id) AS id
INTO #t2
FROM t1
WHERE Period BETWEEN '01-11-2013' and '01-12-2014'
GROUP BY AccountNumber
SELECT t1.id
FROM t1
JOIN #t2 t2
ON t1.id= t2.id
The second query took only 1 second to run. The second query does a index seek using the PK key, whereas the first key does a scan.
Note: id is the primary key clustered on the t1 table.
Upvotes: 5
Views: 350
Reputation: 66
The difference is that in the 1st query, the engine doesn't know the number of results in t2 (which I assume is a relatively small number compared to t1 count, but SQL server doesnt know that in advance). So the execution plan begins with t1 (looping on a large number of rows). However in 2nd query, t2 already has X number of records, which is known for the engine before executing the 2nd part of the query. So in that query the SQL engine will start the execution using t2 (SCAN t2 since its small), and for each key in t2, it will perform index seek in t1.
Upvotes: 0
Reputation: 2282
It is generally better for performance (though a bit more code) to explicitly declare the columns and data types instead of SELECT..INTO
. This may be faster:
CREATE TABLE #t2
(
id INT
);
INSERT INTO #t2(id)
VALUES
(
SELECT MAX(id)
FROM t1
WHERE Period <= '01-11-2013'
AND Period > '01-12-2014'
GROUP BY AccountNumber
);
SELECT t1.id
FROM t1
JOIN #t2 t2
ON t1.id= t2.id
Upvotes: 0
Reputation: 1370
Try This :
SELECT t1.TransactionNumber t1 WHERE t1.id = (SELECT MAX(id) id FROM t1 WHERE Period BETWEEN '01-11-2013' and '01-12-2014' GROUP BY AccountNumber)
Upvotes: 0
Reputation: 2219
This is a guess, but it could be because the statistics on the primary key are not working for you. If the query optimizer thinks you are only going to return 10 records from your inner join, but instead you return 100, it overflows the memory buffer and then ends up having to write the results of the subquery to disk. If you post your query execution plan results, it should become pretty obvious.
Upvotes: 1
Reputation: 39
Joins consume lot of resources and the computed result need to rejoin with the table again so its takes long time. Where as when you are using temp tables the result is already stored in temp table so join condition runs faster than subqueries.
Upvotes: 0
Reputation: 1013
The Main difference between these 2 queries is that the second one is strictly querying off of the index!!!
SELECT t1.TransactionNumber
FROM t1
JOIN
(
SELECT MAX(id) id
FROM t1
WHERE Period BETWEEN '01-11-2013' and '01-12-2014'
GROUP BY AccountNumber
) t2
on t1.id= t2.id
Is querying the transactionNumber column so therefore is cannot use the undex you have on the table, the second query is ONLY using the ID. This will make all the difference in the world.
Upvotes: 0
Reputation: 57381
Can't you place all your condition in the ON section?
SELECT t1.id
FROM t1
JOIN
(
SELECT id
FROM t1
WHERE <condition>
) t2
on t1.id = t2.id;
is transformed into
SELECT t1.id
FROM t1
JOIN t1 as t2
ON t1.id = t2.id AND <condition>
UPDATE:
Retrieving the last record in each group The link shows how to retrieve last record in group. The SQL is following
SELECT m1.*
FROM messages m1 LEFT JOIN messages m2
ON (m1.name = m2.name AND m1.id < m2.id)
WHERE m2.id IS NULL;
You can use this rather than group by
Upvotes: 0