josamiti
josamiti

Reputation: 75

How can I get the last record for a given id in the fastest way? (MS SQL)

I would like to ask for your help.

I have a table which looks like this:

id | sequenceId
---------------
1 | 1
1 | 2
1 | 3
2 | 1
2 | 2
2 | 3
2 | 4
...

There are other columns as well but now it's not important. Another thing, the key is a keypair (id,sequenceId) and these are indexed in the table. What I would like is to get the last row for the given ids. For example, if myId = 1 -> give me the (1,3), myId = 2 -> give me the (2,4) record and so on. In my table, there are 500 ids and every id has 50 000 sequence ids so the size of the records is 500 * 50 000

My query:

SELECT
     myId AS 'MyId',
     MAX(sequenceId) AS 'SequenceId'
FROM myTable
WHERE myId in (SELECT myId from @MyIds)
GROUP BY(myId)
OPTION (RECOMPILE);

Unfortunately, this is not as fast as I want it. In my try, @MyIds contains all ids, 1 - 500 and in this case the execution time is approximately 1 sec. But I would like to make it faster.

Do you have any idea how I can make it faster? Maybe another query which is better than what I use?

Thank you for answers.

Br.

Upvotes: 0

Views: 3043

Answers (6)

Alan Burstein
Alan Burstein

Reputation: 7918

As already mentioned - if you have an index on myId, sequenceId your query should fly. A columnstore index and/or batch mode processing can speed things up dramatically. If you can add a filter to your index, even better. Memory Optimized tables and/or other objects can speed things up too. All that said, let me introduce a new kind of index - the Virtual Index. You can leverage RangeAB or Jeff Moden's FnTally.

Virtual Indexing using dbo.rangeAB

First a quick warm-up. Let's create a query the returns the numbers 1 to 10 in Ascending AND Descending order. Let's do it without an index and WITH a parallel execution plan.

SELECT   r.RN, r.Op
FROM     dbo.rangeAB(1,10,1,1) AS r
ORDER BY r.RN
OPTION (QUERYTRACEON 8649)

Returns:

RN                   Op
-------------------- --------------------
1                    10
2                    9
3                    8
4                    7
5                    6
6                    5
7                    4
8                    3
9                    2
10                   1

Execution plan:

enter image description here

Look ^^^ No Sort!!! So, for Descending ORDER BY's your query looks like this:

-- Last 3 Numbers - no index, no sort + Descending Order + Parallelism (if you want it)
SELECT TOP (3) r.Op
FROM     dbo.rangeAB(1,10,1,1) AS r
ORDER BY r.RN ASC
--OPTION (QUERYTRACEON 8649);

Here we have a virtual FORWARD-ORDER scan that returns the rows in Descending order. No index, no sort operator required! This is not a trick, let's call the function twice and do a few things that require sorting (Grouping by two columns from separate function calls, Joining, traditional aggregates and we'll top it off with a presentation ORDER BY (not)sorted by a Window Ranking Function...

DECLARE @rows INT = 10;

SELECT
  RN1      = r.RN,
  RN1_DESC = @rows+1-r.RN,
  RN2      = r2.RN,
  RN1_Low  = MIN(r.RN),
  RN1_High = MAX(r.RN),
  RN1_Avg  = AVG(r.RN)
FROM      dbo.rangeAB(1,@rows,1,1) AS r
LEFT JOIN dbo.rangeAB(1,3,1,1)  AS r2 ON r.RN = r2.RN
GROUP BY  r.RN, r2.RN 
ORDER BY  DENSE_RANK() OVER (ORDER BY r.RN);

Returns:

RN1   RN1_DESC    RN2      RN1_Low    RN1_High   RN1_Avg
----- ----------- -------- ---------- ---------- --------------------
1     10          1        1          1          1
2     9           2        2          2          2
3     8           3        3          3          3
4     7           NULL     4          4          4
5     6           NULL     5          5          5
6     5           NULL     6          6          6
7     4           NULL     7          7          7
8     3           NULL     8          8          8
9     2           NULL     9          9          9
10    1           NULL     10         10         10

The result set was not intended to be meaningful, it's the execution plan I'm interested in; let's have a look.

enter image description here

Back to your Query

-- Sample data
DECLARE @table TABLE (id INT NOT NULL, sequenceId INT NOT NULL)--, INDEX xxx(id,sequenceId))
INSERT @table VALUES(1,1),(1,2),(1,3),(2,1),(2,2),(2,3),(2,4)

SELECT r.RN, sequenceId = MAX(t.sequenceId)
FROM        
(
  SELECT MIN(t.id), MAX(t.id), MIN(t.sequenceId), MAX(t.sequenceId) 
  FROM   @table AS t
) AS mm(Mn,Mx,Mns,Mxs)
CROSS APPLY dbo.rangeAB(mm.Mn,mm.Mx,1,1)   AS r
CROSS APPLY dbo.rangeAB(mm.Mns,mm.Mxs,1,1) AS r2
JOIN        @table                         AS t 
  ON        r.RN = t.id AND r2.RN = Mxs
GROUP BY    r.RN
OPTION (QUERYTRACEON 8649);

No Index, no sort, no I/O, no loss of parallelism (regardless of direction) and no RBAR!

enter image description here

Upvotes: 0

Gordon Linoff
Gordon Linoff

Reputation: 1269773

I would recommend the following:

select i.myId,
       (select max(t.sequenceId)
        from myTable t
        where t.myId = i.myId
       )
from @MyIds i;

Then for performance you want an index on myTable(myId, sequenceId desc).

Upvotes: 0

Roger Wolf
Roger Wolf

Reputation: 7692

Your query is correct and relatively optimal; you probably wouldn't get any improvement by rewriting it in another way, other than replacing the table variable with indexed temporary table.

Performance optimisation is generally about indices. Depending on whether the id column is indexed, one of the following options should help:

create index [IX_mytable_myid_sequenceid] on dbo.mytable (myid, sequenceid desc);

If the clustered index on the table is created on the myId column, then you can save a bit of space:

create index [IX_mytable_sequenceid] on dbo.mytable (sequenceid desc);

Sort order is important, because unfortunately SQL Server cannot use a parallel plan when it has to perform a backward scan / seek. However, you can try both asc and desc sorts for the sequenceId column; chances are, it will make no difference in your particular case.

Regardless of the indices, you probably need to replace your table variable with temporary table. Depending on the version of SQL Server you use, cardinality estimator assumes that table variable has either 1 or 100 rows. If your amount of data misses the estimate, execution plan will be sub-optimal. So the code should look like this:

create table #list (Id int primary key);

insert into #list (Id)
-- Assuming there are no duplicates, otherwise add DISTINCT
select MyId from @MyIds;

SELECT
     t.myId AS 'MyId',
     MAX(t.sequenceId) AS 'SequenceId'
FROM myTable t
  inner join #list l on l.Id = t.myId
GROUP BY t.myId
-- OPTION (RECOMPILE);

Whether you should leave the option clause depends on the performance.

Upvotes: 2

pi.314
pi.314

Reputation: 620

select
    id.id as id,
    seq as maxSequence,
    data.someData as someData
from
(select id, max(sequenceId) as seq from #tab group by id) id
left join #tab data on id.id = data.id and id.seq = data.sequenceId

Upvotes: 0

mkRabbani
mkRabbani

Reputation: 16908

You can try with INNER JOIN as below-

SELECT
A.myId AS 'MyId',
MAX(A.sequenceId) AS 'SequenceId'
FROM myTable A
INNER JOIN @MyIds B
ON A.myId = B.myId
GROUP BY(A.myId)

This following script will return you max sequence value for each myID-

SELECT * FROM 
(
    SELECT myId,sequenceId, 
    ROW_NUMBER() OVER(PARTITION BY myId ORDER BY sequenceId DESC) RN
    FROM  myTable
)A
WHERE RN = 1

Upvotes: 0

MtwStark
MtwStark

Reputation: 4058

First, @MyIds is a table variable isn't it? How do you declare this? is it indexed? Add primary key on it:

DECLARE @MyIds TABLE (ID INT PRIMARY KEY)

Second, be sure your key is on myId+sequenceId and not on sequenceId+myId

Third, avoid the IN clause with many items, this is a bottleneck

This should be your best option:

SELECT myId MyId, MAX(sequenceId) SequenceId
FROM myTable t
WHERE EXISTS (SELECT TOP 1 'X' X from @MyIds m WHERE m.myId = t.myId)
GROUP BY myId

You can also try to force the filter after the grouping, have a try:

SELECT * 
FROM (
    SELECT TOP (9223372036854775807) myId MyId, MAX(sequenceId) SequenceId
    FROM myTable t
    GROUP BY myId
    ORDER BY myId
) T
WHERE EXISTS (SELECT TOP 1 'X' X from @MyIds m WHERE m.myId = t.myId)

Upvotes: 0

Related Questions