Reputation: 75
I would like to ask for your help.
I have a table which looks like this:
id | sequenceId
---------------
1 | 1
1 | 2
1 | 3
2 | 1
2 | 2
2 | 3
2 | 4
...
There are other columns as well but now it's not important. Another thing, the key is a keypair (id,sequenceId) and these are indexed in the table. What I would like is to get the last row for the given ids. For example, if myId = 1 -> give me the (1,3), myId = 2 -> give me the (2,4) record and so on. In my table, there are 500 ids and every id has 50 000 sequence ids so the size of the records is 500 * 50 000
My query:
SELECT
myId AS 'MyId',
MAX(sequenceId) AS 'SequenceId'
FROM myTable
WHERE myId in (SELECT myId from @MyIds)
GROUP BY(myId)
OPTION (RECOMPILE);
Unfortunately, this is not as fast as I want it. In my try, @MyIds contains all ids, 1 - 500 and in this case the execution time is approximately 1 sec. But I would like to make it faster.
Do you have any idea how I can make it faster? Maybe another query which is better than what I use?
Thank you for answers.
Br.
Upvotes: 0
Views: 3043
Reputation: 7918
As already mentioned - if you have an index on myId, sequenceId your query should fly. A columnstore index and/or batch mode processing can speed things up dramatically. If you can add a filter to your index, even better. Memory Optimized tables and/or other objects can speed things up too. All that said, let me introduce a new kind of index - the Virtual Index
. You can leverage RangeAB or Jeff Moden's FnTally.
Virtual Indexing using dbo.rangeAB
First a quick warm-up. Let's create a query the returns the numbers 1 to 10 in Ascending AND Descending order. Let's do it without an index and WITH a parallel execution plan.
SELECT r.RN, r.Op
FROM dbo.rangeAB(1,10,1,1) AS r
ORDER BY r.RN
OPTION (QUERYTRACEON 8649)
Returns:
RN Op
-------------------- --------------------
1 10
2 9
3 8
4 7
5 6
6 5
7 4
8 3
9 2
10 1
Execution plan:
Look ^^^ No Sort!!! So, for Descending ORDER BY's your query looks like this:
-- Last 3 Numbers - no index, no sort + Descending Order + Parallelism (if you want it)
SELECT TOP (3) r.Op
FROM dbo.rangeAB(1,10,1,1) AS r
ORDER BY r.RN ASC
--OPTION (QUERYTRACEON 8649);
Here we have a virtual FORWARD-ORDER scan that returns the rows in Descending order. No index, no sort operator required! This is not a trick, let's call the function twice and do a few things that require sorting (Grouping by two columns from separate function calls, Joining, traditional aggregates and we'll top it off with a presentation ORDER BY (not)sorted by a Window Ranking Function...
DECLARE @rows INT = 10;
SELECT
RN1 = r.RN,
RN1_DESC = @rows+1-r.RN,
RN2 = r2.RN,
RN1_Low = MIN(r.RN),
RN1_High = MAX(r.RN),
RN1_Avg = AVG(r.RN)
FROM dbo.rangeAB(1,@rows,1,1) AS r
LEFT JOIN dbo.rangeAB(1,3,1,1) AS r2 ON r.RN = r2.RN
GROUP BY r.RN, r2.RN
ORDER BY DENSE_RANK() OVER (ORDER BY r.RN);
Returns:
RN1 RN1_DESC RN2 RN1_Low RN1_High RN1_Avg
----- ----------- -------- ---------- ---------- --------------------
1 10 1 1 1 1
2 9 2 2 2 2
3 8 3 3 3 3
4 7 NULL 4 4 4
5 6 NULL 5 5 5
6 5 NULL 6 6 6
7 4 NULL 7 7 7
8 3 NULL 8 8 8
9 2 NULL 9 9 9
10 1 NULL 10 10 10
The result set was not intended to be meaningful, it's the execution plan I'm interested in; let's have a look.
Back to your Query
-- Sample data
DECLARE @table TABLE (id INT NOT NULL, sequenceId INT NOT NULL)--, INDEX xxx(id,sequenceId))
INSERT @table VALUES(1,1),(1,2),(1,3),(2,1),(2,2),(2,3),(2,4)
SELECT r.RN, sequenceId = MAX(t.sequenceId)
FROM
(
SELECT MIN(t.id), MAX(t.id), MIN(t.sequenceId), MAX(t.sequenceId)
FROM @table AS t
) AS mm(Mn,Mx,Mns,Mxs)
CROSS APPLY dbo.rangeAB(mm.Mn,mm.Mx,1,1) AS r
CROSS APPLY dbo.rangeAB(mm.Mns,mm.Mxs,1,1) AS r2
JOIN @table AS t
ON r.RN = t.id AND r2.RN = Mxs
GROUP BY r.RN
OPTION (QUERYTRACEON 8649);
No Index, no sort, no I/O, no loss of parallelism (regardless of direction) and no RBAR!
Upvotes: 0
Reputation: 1269773
I would recommend the following:
select i.myId,
(select max(t.sequenceId)
from myTable t
where t.myId = i.myId
)
from @MyIds i;
Then for performance you want an index on myTable(myId, sequenceId desc)
.
Upvotes: 0
Reputation: 7692
Your query is correct and relatively optimal; you probably wouldn't get any improvement by rewriting it in another way, other than replacing the table variable with indexed temporary table.
Performance optimisation is generally about indices. Depending on whether the id
column is indexed, one of the following options should help:
create index [IX_mytable_myid_sequenceid] on dbo.mytable (myid, sequenceid desc);
If the clustered index on the table is created on the myId
column, then you can save a bit of space:
create index [IX_mytable_sequenceid] on dbo.mytable (sequenceid desc);
Sort order is important, because unfortunately SQL Server cannot use a parallel plan when it has to perform a backward scan / seek. However, you can try both asc
and desc
sorts for the sequenceId
column; chances are, it will make no difference in your particular case.
Regardless of the indices, you probably need to replace your table variable with temporary table. Depending on the version of SQL Server you use, cardinality estimator assumes that table variable has either 1 or 100 rows. If your amount of data misses the estimate, execution plan will be sub-optimal. So the code should look like this:
create table #list (Id int primary key);
insert into #list (Id)
-- Assuming there are no duplicates, otherwise add DISTINCT
select MyId from @MyIds;
SELECT
t.myId AS 'MyId',
MAX(t.sequenceId) AS 'SequenceId'
FROM myTable t
inner join #list l on l.Id = t.myId
GROUP BY t.myId
-- OPTION (RECOMPILE);
Whether you should leave the option
clause depends on the performance.
Upvotes: 2
Reputation: 620
select
id.id as id,
seq as maxSequence,
data.someData as someData
from
(select id, max(sequenceId) as seq from #tab group by id) id
left join #tab data on id.id = data.id and id.seq = data.sequenceId
Upvotes: 0
Reputation: 16908
You can try with INNER JOIN as below-
SELECT
A.myId AS 'MyId',
MAX(A.sequenceId) AS 'SequenceId'
FROM myTable A
INNER JOIN @MyIds B
ON A.myId = B.myId
GROUP BY(A.myId)
This following script will return you max sequence value for each myID-
SELECT * FROM
(
SELECT myId,sequenceId,
ROW_NUMBER() OVER(PARTITION BY myId ORDER BY sequenceId DESC) RN
FROM myTable
)A
WHERE RN = 1
Upvotes: 0
Reputation: 4058
First, @MyIds is a table variable isn't it? How do you declare this? is it indexed? Add primary key on it:
DECLARE @MyIds TABLE (ID INT PRIMARY KEY)
Second, be sure your key is on myId+sequenceId
and not on sequenceId+myId
Third, avoid the IN
clause with many items, this is a bottleneck
This should be your best option:
SELECT myId MyId, MAX(sequenceId) SequenceId
FROM myTable t
WHERE EXISTS (SELECT TOP 1 'X' X from @MyIds m WHERE m.myId = t.myId)
GROUP BY myId
You can also try to force the filter after the grouping, have a try:
SELECT *
FROM (
SELECT TOP (9223372036854775807) myId MyId, MAX(sequenceId) SequenceId
FROM myTable t
GROUP BY myId
ORDER BY myId
) T
WHERE EXISTS (SELECT TOP 1 'X' X from @MyIds m WHERE m.myId = t.myId)
Upvotes: 0