Makotosan
Makotosan

Reputation: 1148

SQL Server - Select most recent records from a group of similar records

-- Scroll down for an edit that I added --

So here's my scenario. I have a table that has an entry for every time someone makes a change to some data. The reason for this is that we need to be able to audit all the changes.

However, I only want to retrieve the latest record for a series of edits a user has made.

So let's say there are three users, user A, B and C.

User A makes 10 changes (10 entries in the table). User B makes 5 changes User A makes 3 more changes User C makes 2 changes

What I want to get back is: Most recent of the 2 records that C created Most recent of the 3 records that A created Most recent of the 5 records that B created Most recent of the 10 records that A created

For a total of 4 rows that I get back

Here's what I tried, but the problem is that the RowNum doesn't go back to 1 when the LastUpdatedBy changes:

WITH cte AS 
(
    SELECT 
        [LastUpdatedOn]
        ,[LastUpdatedBy]
        ,ROW_NUMBER() OVER(PARTITION BY [LastUpdatedBy] ORDER BY [LastUpdatedOn] DESC) [RowNum]
    FROM [HistoricalTable] 
)           
SELECT 
    [LastUpdatedOn]
    ,[LastUpdatedBy]
    ,RowNum
FROM cte
--WHERE RowNum = 1 
ORDER BY [LastUpdatedOn] DESC;

And here's the output I get (** asterisks represent the rows I want back)

LastUpdatedOn   LastUpdatedBy   RowNum
**2011-06-07 13:07:26.917   629 1**
2011-06-07 12:57:53.700 629 2
2011-06-07 12:57:44.387 629 3
2011-06-07 12:57:34.913 629 4
2011-06-07 12:57:25.040 629 5
2011-06-07 12:57:19.927 629 6
2011-06-07 12:55:17.460 629 7
2011-06-07 12:55:12.287 629 8
2011-06-07 12:30:34.377 629 9
**2011-06-07 11:54:05.727   4   1**
**2011-06-07 11:50:02.723   629 10** (If this number went back to 1, my query would have worked fine)
2011-06-07 11:26:43.053 629 11
2011-06-07 10:54:32.867 629 12
2011-06-07 10:46:32.107 629 13
2011-06-07 10:40:52.937 629 14
**2011-06-07 10:39:50.880   3   1**

------------------- EDIT --------------------

So I came up with a solution, but it's not very elegant and not sure if I like it, but it does the trick. This might give you a better understanding of what I'm trying to accomplish.

DECLARE @temp AS TABLE(LastUpdatedOn datetime, LastUpdatedBy int null, RowNum int);

DECLARE @newTable AS TABLE(LastUpdatedOn datetime, LastUpdatedBy int null);

DECLARE @lastUserId int = 0;

INSERT INTO @temp
SELECT 
    [LastUpdatedOn]
    ,[LastUpdatedBy]
    ,ROW_NUMBER() OVER(ORDER BY [LastUpdatedOn] DESC) [RowNum]
    FROM [HistoricalTable] 

DECLARE @totalRecords int;
SELECT @totalRecords = COUNT(*) FROM @temp;
DECLARE @counter int = 0;
WHILE @counter <= @totalRecords BEGIN
    SET @counter = @counter + 1;

    INSERT INTO @newTable
    SELECT LastUpdatedOn, LastUpdatedBy
    FROM @temp 
    WHERE RowNum = @counter AND (@lastUserId != LastUpdatedBy OR (LastUpdatedBy IS NULL));

    SELECT @lastUserId = LastUpdatedBy  FROM @temp WHERE RowNum = @counter;     
END

SELECT * FROM @newTable;

And the data that is returned:

LastUpdatedOn   LastUpdatedBy
2011-06-07 13:07:26.917 629
2011-06-07 11:54:05.727 4
2011-06-07 11:50:02.723 629
2011-06-07 10:39:50.880 3

Upvotes: 5

Views: 7933

Answers (4)

Ben Thul
Ben Thul

Reputation: 32697

It struck me this morning that this is an islands problem. Here's my solution:

CREATE TABLE #tmp (
 LastUpdatedBy INT,
 LastUpdatedOn DATETIME
)

INSERT  INTO #tmp
        ( LastUpdatedOn, LastUpdatedBy )
VALUES  ( '2011-06-07 13:07:26.917', 629 ),
        ( '2011-06-07 12:57:53.700', 629 ),
        ( '2011-06-07 12:57:44.387', 629 ),
        ( '2011-06-07 12:57:34.913', 629 ),
        ( '2011-06-07 12:57:25.040', 629 ),
        ( '2011-06-07 12:57:19.927', 629 ),
        ( '2011-06-07 12:55:17.460', 629 ),
        ( '2011-06-07 12:55:12.287', 629 ),
        ( '2011-06-07 12:30:34.377', 629 ),
        ( '2011-06-07 11:54:05.727', 4 ),
        ( '2011-06-07 11:50:02.723', 629 ),
        ( '2011-06-07 11:26:43.053', 629 ),
        ( '2011-06-07 10:54:32.867', 629 ),
        ( '2011-06-07 10:46:32.107', 629 ),
        ( '2011-06-07 10:40:52.937', 629 ),
        ( '2011-06-07 10:39:50.880', 3 ) ;

WITH    cte
          AS ( SELECT   [LastUpdatedOn],
                        [LastUpdatedBy],
                        ROW_NUMBER() OVER ( PARTITION BY [LastUpdatedBy] ORDER BY [LastUpdatedOn] DESC ) - ROW_NUMBER() OVER ( ORDER BY [LastUpdatedOn] DESC ) AS [Island]
               FROM     #tmp
             ),
        cte2
          AS ( SELECT   *,
                        ROW_NUMBER() OVER ( PARTITION BY [Island] ORDER BY [LastUpdatedOn] DESC ) AS [rn]
               FROM     cte
             )
    SELECT  [LastUpdatedOn],
            [LastUpdatedBy]
    FROM    cte2
    WHERE   [rn] = 1
    ORDER BY [LastUpdatedOn] DESC ;

The "trick" here is to note that if you keep track of the row_number both within a partition and for the whole set, the difference between the two will change when the partition changes.

Upvotes: 1

Jon Black
Jon Black

Reputation: 16559

Not sure if i'm missing something in your question but doesn't the following SQL answer the problem ?

declare @HistoricalTable table (LastUpdatedOn datetime, LastUpdatedBy int);

insert into @HistoricalTable (LastUpdatedOn, LastUpdatedBy) values 
('2011-06-07 13:07:26.917', 629),('2011-06-07 12:57:53.700', 629),
('2011-06-07 12:57:44.387', 629),('2011-06-07 12:57:34.913', 629),
('2011-06-07 12:57:25.040', 629),('2011-06-07 12:57:19.927', 629),
('2011-06-07 12:55:17.460', 629),('2011-06-07 12:55:12.287', 629),
('2011-06-07 12:30:34.377', 629),('2011-06-07 11:54:05.727', 4),
('2011-06-07 11:50:02.723', 629),('2011-06-07 11:26:43.053', 629),
('2011-06-07 10:54:32.867', 629),('2011-06-07 10:46:32.107', 629),
('2011-06-07 10:40:52.937', 629),('2011-06-07 10:39:50.880', 3);

select 
 latest.* 
from
(
 select *, rank() over (partition by LastUpdatedBy order by LastUpdatedOn desc) as UpdateRank 
  from @HistoricalTable
) latest
where
 latest.UpdateRank = 1
order by
 latest.LastUpdatedBy;

LastUpdatedOn           LastUpdatedBy   UpdateRank
2011-06-07 10:39:50.880            3            1
2011-06-07 11:54:05.727            4            1
2011-06-07 13:07:26.917          629            1

Upvotes: 2

Mikael Eriksson
Mikael Eriksson

Reputation: 138960

;with cte as
(
  select *,
    row_number() over(order by LastUpdatedOn) as rn
  from HistoricalTable
)
select C1.LastUpdatedOn,
       C1.LastUpdatedBy
from cte as C1
  left outer join cte as C2
    on C1.rn = C2.rn-1
where C1.LastUpdatedBy <> coalesce(C2.LastUpdatedBy, 0)

Creating a row number for each row order by LastUpdatedOn and joining to the next row and comparing if LastUpdatedBy is changed. Beware of this coalesce(C2.LastUpdatedBy, 0). It is to get the last row and the 0 need to be some integer value not used as LastUpdatedBy.

Upvotes: 6

Daniel Renshaw
Daniel Renshaw

Reputation: 34177

This is entirely untested but it might form the basis of a working solution:

SELECT
    [Outer].[LastUpdatedOn],
    [Outer].[LastUpdatedBy]
FROM [HistoricalTable] AS [Outer]
WHERE NOT EXISTS
(
    SELECT *
    FROM [HistoricalTable] AS [Middle]
    WHERE [Middle].[LastUpdatedBy] = [Outer].[LastUpdatedBy]
        AND [Middle].[LastUpdatedOn] > [Outer].[LastUpdatedOn]
        AND [Middle].[LastUpdatedOn] <= ISNULL(
        (
            SELECT
                MIN([Inner].[LastUpdatedOn])
            FROM [HistoricalTable] AS [Inner]
            WHERE [Inner].[LastUpdatedBy] != [Outer].[LastUpdatedBy]
                AND [Inner].[LastUpdatedOn] > [Outer].[LastUpdatedOn]
        ), [Middle].[LastUpdatedOn])
)

Even if this approach works, the performance will probably be terrible assuming you have more than just a handful of rows.

For each row in the table it makes sure there does not exist any other row(s) by the same user between the context row and the oldest row that is more recent than the context row linked to a different user.

Upvotes: 0

Related Questions