Reputation: 1148
-- Scroll down for an edit that I added --
So here's my scenario. I have a table that has an entry for every time someone makes a change to some data. The reason for this is that we need to be able to audit all the changes.
However, I only want to retrieve the latest record for a series of edits a user has made.
So let's say there are three users, user A, B and C.
User A makes 10 changes (10 entries in the table). User B makes 5 changes User A makes 3 more changes User C makes 2 changes
What I want to get back is: Most recent of the 2 records that C created Most recent of the 3 records that A created Most recent of the 5 records that B created Most recent of the 10 records that A created
For a total of 4 rows that I get back
Here's what I tried, but the problem is that the RowNum doesn't go back to 1 when the LastUpdatedBy changes:
WITH cte AS
(
SELECT
[LastUpdatedOn]
,[LastUpdatedBy]
,ROW_NUMBER() OVER(PARTITION BY [LastUpdatedBy] ORDER BY [LastUpdatedOn] DESC) [RowNum]
FROM [HistoricalTable]
)
SELECT
[LastUpdatedOn]
,[LastUpdatedBy]
,RowNum
FROM cte
--WHERE RowNum = 1
ORDER BY [LastUpdatedOn] DESC;
And here's the output I get (** asterisks represent the rows I want back)
LastUpdatedOn LastUpdatedBy RowNum
**2011-06-07 13:07:26.917 629 1**
2011-06-07 12:57:53.700 629 2
2011-06-07 12:57:44.387 629 3
2011-06-07 12:57:34.913 629 4
2011-06-07 12:57:25.040 629 5
2011-06-07 12:57:19.927 629 6
2011-06-07 12:55:17.460 629 7
2011-06-07 12:55:12.287 629 8
2011-06-07 12:30:34.377 629 9
**2011-06-07 11:54:05.727 4 1**
**2011-06-07 11:50:02.723 629 10** (If this number went back to 1, my query would have worked fine)
2011-06-07 11:26:43.053 629 11
2011-06-07 10:54:32.867 629 12
2011-06-07 10:46:32.107 629 13
2011-06-07 10:40:52.937 629 14
**2011-06-07 10:39:50.880 3 1**
------------------- EDIT --------------------
So I came up with a solution, but it's not very elegant and not sure if I like it, but it does the trick. This might give you a better understanding of what I'm trying to accomplish.
DECLARE @temp AS TABLE(LastUpdatedOn datetime, LastUpdatedBy int null, RowNum int);
DECLARE @newTable AS TABLE(LastUpdatedOn datetime, LastUpdatedBy int null);
DECLARE @lastUserId int = 0;
INSERT INTO @temp
SELECT
[LastUpdatedOn]
,[LastUpdatedBy]
,ROW_NUMBER() OVER(ORDER BY [LastUpdatedOn] DESC) [RowNum]
FROM [HistoricalTable]
DECLARE @totalRecords int;
SELECT @totalRecords = COUNT(*) FROM @temp;
DECLARE @counter int = 0;
WHILE @counter <= @totalRecords BEGIN
SET @counter = @counter + 1;
INSERT INTO @newTable
SELECT LastUpdatedOn, LastUpdatedBy
FROM @temp
WHERE RowNum = @counter AND (@lastUserId != LastUpdatedBy OR (LastUpdatedBy IS NULL));
SELECT @lastUserId = LastUpdatedBy FROM @temp WHERE RowNum = @counter;
END
SELECT * FROM @newTable;
And the data that is returned:
LastUpdatedOn LastUpdatedBy
2011-06-07 13:07:26.917 629
2011-06-07 11:54:05.727 4
2011-06-07 11:50:02.723 629
2011-06-07 10:39:50.880 3
Upvotes: 5
Views: 7933
Reputation: 32697
It struck me this morning that this is an islands problem. Here's my solution:
CREATE TABLE #tmp (
LastUpdatedBy INT,
LastUpdatedOn DATETIME
)
INSERT INTO #tmp
( LastUpdatedOn, LastUpdatedBy )
VALUES ( '2011-06-07 13:07:26.917', 629 ),
( '2011-06-07 12:57:53.700', 629 ),
( '2011-06-07 12:57:44.387', 629 ),
( '2011-06-07 12:57:34.913', 629 ),
( '2011-06-07 12:57:25.040', 629 ),
( '2011-06-07 12:57:19.927', 629 ),
( '2011-06-07 12:55:17.460', 629 ),
( '2011-06-07 12:55:12.287', 629 ),
( '2011-06-07 12:30:34.377', 629 ),
( '2011-06-07 11:54:05.727', 4 ),
( '2011-06-07 11:50:02.723', 629 ),
( '2011-06-07 11:26:43.053', 629 ),
( '2011-06-07 10:54:32.867', 629 ),
( '2011-06-07 10:46:32.107', 629 ),
( '2011-06-07 10:40:52.937', 629 ),
( '2011-06-07 10:39:50.880', 3 ) ;
WITH cte
AS ( SELECT [LastUpdatedOn],
[LastUpdatedBy],
ROW_NUMBER() OVER ( PARTITION BY [LastUpdatedBy] ORDER BY [LastUpdatedOn] DESC ) - ROW_NUMBER() OVER ( ORDER BY [LastUpdatedOn] DESC ) AS [Island]
FROM #tmp
),
cte2
AS ( SELECT *,
ROW_NUMBER() OVER ( PARTITION BY [Island] ORDER BY [LastUpdatedOn] DESC ) AS [rn]
FROM cte
)
SELECT [LastUpdatedOn],
[LastUpdatedBy]
FROM cte2
WHERE [rn] = 1
ORDER BY [LastUpdatedOn] DESC ;
The "trick" here is to note that if you keep track of the row_number both within a partition and for the whole set, the difference between the two will change when the partition changes.
Upvotes: 1
Reputation: 16559
Not sure if i'm missing something in your question but doesn't the following SQL answer the problem ?
declare @HistoricalTable table (LastUpdatedOn datetime, LastUpdatedBy int);
insert into @HistoricalTable (LastUpdatedOn, LastUpdatedBy) values
('2011-06-07 13:07:26.917', 629),('2011-06-07 12:57:53.700', 629),
('2011-06-07 12:57:44.387', 629),('2011-06-07 12:57:34.913', 629),
('2011-06-07 12:57:25.040', 629),('2011-06-07 12:57:19.927', 629),
('2011-06-07 12:55:17.460', 629),('2011-06-07 12:55:12.287', 629),
('2011-06-07 12:30:34.377', 629),('2011-06-07 11:54:05.727', 4),
('2011-06-07 11:50:02.723', 629),('2011-06-07 11:26:43.053', 629),
('2011-06-07 10:54:32.867', 629),('2011-06-07 10:46:32.107', 629),
('2011-06-07 10:40:52.937', 629),('2011-06-07 10:39:50.880', 3);
select
latest.*
from
(
select *, rank() over (partition by LastUpdatedBy order by LastUpdatedOn desc) as UpdateRank
from @HistoricalTable
) latest
where
latest.UpdateRank = 1
order by
latest.LastUpdatedBy;
LastUpdatedOn LastUpdatedBy UpdateRank
2011-06-07 10:39:50.880 3 1
2011-06-07 11:54:05.727 4 1
2011-06-07 13:07:26.917 629 1
Upvotes: 2
Reputation: 138960
;with cte as
(
select *,
row_number() over(order by LastUpdatedOn) as rn
from HistoricalTable
)
select C1.LastUpdatedOn,
C1.LastUpdatedBy
from cte as C1
left outer join cte as C2
on C1.rn = C2.rn-1
where C1.LastUpdatedBy <> coalesce(C2.LastUpdatedBy, 0)
Creating a row number for each row order by LastUpdatedOn
and joining to the next row and comparing if LastUpdatedBy is changed.
Beware of this coalesce(C2.LastUpdatedBy, 0)
. It is to get the last row and the 0
need to be some integer value not used as LastUpdatedBy
.
Upvotes: 6
Reputation: 34177
This is entirely untested but it might form the basis of a working solution:
SELECT
[Outer].[LastUpdatedOn],
[Outer].[LastUpdatedBy]
FROM [HistoricalTable] AS [Outer]
WHERE NOT EXISTS
(
SELECT *
FROM [HistoricalTable] AS [Middle]
WHERE [Middle].[LastUpdatedBy] = [Outer].[LastUpdatedBy]
AND [Middle].[LastUpdatedOn] > [Outer].[LastUpdatedOn]
AND [Middle].[LastUpdatedOn] <= ISNULL(
(
SELECT
MIN([Inner].[LastUpdatedOn])
FROM [HistoricalTable] AS [Inner]
WHERE [Inner].[LastUpdatedBy] != [Outer].[LastUpdatedBy]
AND [Inner].[LastUpdatedOn] > [Outer].[LastUpdatedOn]
), [Middle].[LastUpdatedOn])
)
Even if this approach works, the performance will probably be terrible assuming you have more than just a handful of rows.
For each row in the table it makes sure there does not exist any other row(s) by the same user between the context row and the oldest row that is more recent than the context row linked to a different user.
Upvotes: 0