Reputation: 590
How can I get a RANK that restarts at partition change? I have this table:
ID Date Value
1 2015-01-01 1
2 2015-01-02 1 <redundant
3 2015-01-03 2
4 2015-01-05 2 <redundant
5 2015-01-06 1
6 2015-01-08 1 <redundant
7 2015-01-09 1 <redundant
8 2015-01-10 2
9 2015-01-11 3
10 2015-01-12 3 <redundant
and I'm trying to delete all the rows where the Value is not changed from the previous entry (marked with < redundant). I've tried using cursors but it takes too long, as the table has ~50 million rows.
I've also tried using RANK:
SELECT ID, Date, Value,
RANK() over(partition by Value order by Date ASC) Rank,
FROM DataLogging
ORDER BY Date ASC
but I get:
ID Date Value Rank (Rank)
1 2015-01-01 1 1 (1)
2 2015-01-02 1 2 (2)
3 2015-01-03 2 1 (1)
4 2015-01-05 2 2 (2)
5 2015-01-06 1 3 (1)
6 2015-01-08 1 4 (2)
7 2015-01-09 1 5 (3)
8 2015-01-10 2 3 (1)
9 2015-01-11 3 1 (1)
10 2015-01-12 3 2 (2)
in parantheses is the Rank I would want, so that I can filter out rows with Rank = 1 and delete the rest of the rows.
EDIT: I've accepted the answer that seemed the easiest to write, but unfortunately none of the answers runs fast enough for deleting the rows. In the end I've decided to use the CURSOR afterall. I've split the data in chuncks of about 250k rows and the cursor runs through and deletes the rows in ~11 mins per batch of 250k rows, and the answers below, with DELETE, take ~35 mins per batch of 250k rows.
Upvotes: 5
Views: 14940
Reputation: 1
Worked for my case! thanks I had to fetch the report_to change for an employee with respect to the previous reports_to valueand effdt. In other words, fetcth min effective date row for each reports_to change for an employee.
with tocheck as ( select T.emplid,T.reports_to,T.effdt, lag(reports_to) over (order by effdt) as prev_value from PS_JOB t ) select * from tocheck where reports_to <> prev_value or prev_value is null;
added changes further as p
Upvotes: 0
Reputation: 2540
This is interesting so I'd thought I'd jump in. Unfortunately, creating a solution with RANK()
(or rather, ROW_NUMBER()
) without first transforming the data looks to be unobtainable. In an attempt to transform the data, I came up with this solution that uses 1 ROW_NUMBER()
:
;WITH Ordered AS
(
SELECT ROW_NUMBER() OVER (ORDER BY [Date]) AS [Row], *
FROM DataLogging
),
Final AS
(
SELECT
o1.*, NULLIF(o1.Value - ISNULL(o2.Value, o1.Value - 1), 0) [Change]
FROM
Ordered o1
LEFT JOIN Ordered o2 ON
o1.[Row] = o2.[Row] + 1
)
SELECT * FROM Final
In the last Change
column, the value will be NULL
if there is no change in value (but will have the difference if there is a change).
So to do the delete, change the select to
DELETE FROM DataLogging where Change IS NULL
Edit: Lag would work here too but I was visualizing the solution as I went along and completely forgot about that.
Upvotes: 0
Reputation: 45096
select *
from ( select ID, Date, Value, lag(Value, 1, 0) over (order by ID) as ValueLag
from table ) tt
where ValueLag is null or ValueLag <> Value
if the order is Date then over (order by Date)
this should show you good and bad - it is based on ID - it you need date then revise
it may look like a long way around but it should be pretty efficient
declare @tt table (id tinyint, val tinyint);
insert into @tt values
( 1, 1),
( 2, 1),
( 3, 2),
( 4, 2),
( 5, 1),
( 6, 1),
( 7, 1),
( 8, 2),
( 9, 3),
(10, 3);
select id, val, LAG(val) over (order by id) as lagVal
from @tt;
-- find the good
select id, val
from ( select id, val, LAG(val) over (order by id) as lagVal
from @tt
) tt
where lagVal is null or lagVal <> val
-- select the bad
select tt.id, tt.val
from @tt tt
left join ( select id, val
from ( select id, val, LAG(val) over (order by id) as lagVal
from @tt
) ttt
where ttt.lagVal is null or ttt.lagVal <> ttt.val
) tttt
on tttt.id = tt.id
where tttt.id is null
Upvotes: 2
Reputation: 1269733
If you want to delete the rows, I would suggest you use lag()
:
with todelete as (
select t.*, lag(value) over (order by date) as prev_value
from t
)
delete from todelete
where value = prev_value;
I'm not quite sure what rank()
has to do with the problem.
EDIT:
To see the rows not deleted with the same logic:
with todelete as (
select t.*, lag(value) over (order by date) as prev_value
from t
)
select *
from todelete
where value <> prev_value or prev_value is null;
The where
clause is just the inverse of the where
clause in the first query, taking NULL
values into account.
Upvotes: 2
Reputation: 70638
Here is a somewhat convoluted way to do it:
WITH CTE AS
(
SELECT *,
ROW_NUMBER() OVER(ORDER BY [Date]) RN1,
ROW_NUMBER() OVER(PARTITION BY Value ORDER BY [Date]) RN2
FROM dbo.YourTable
), CTE2 AS
(
SELECT *, ROW_NUMBER() OVER(PARTITION BY Value, RN1 - RN2 ORDER BY [Date]) N
FROM CTE
)
SELECT *
FROM CTE2
ORDER BY ID;
The results are:
╔════╦════════════╦═══════╦═════╦═════╦═══╗
║ ID ║ Date ║ Value ║ RN1 ║ RN2 ║ N ║
╠════╬════════════╬═══════╬═════╬═════╬═══╣
║ 1 ║ 2015-01-01 ║ 1 ║ 1 ║ 1 ║ 1 ║
║ 2 ║ 2015-01-02 ║ 1 ║ 2 ║ 2 ║ 2 ║
║ 3 ║ 2015-01-03 ║ 2 ║ 3 ║ 1 ║ 1 ║
║ 4 ║ 2015-01-05 ║ 2 ║ 4 ║ 2 ║ 2 ║
║ 5 ║ 2015-01-06 ║ 1 ║ 5 ║ 3 ║ 1 ║
║ 6 ║ 2015-01-08 ║ 1 ║ 6 ║ 4 ║ 2 ║
║ 7 ║ 2015-01-09 ║ 1 ║ 7 ║ 5 ║ 3 ║
║ 8 ║ 2015-01-10 ║ 2 ║ 8 ║ 3 ║ 1 ║
║ 9 ║ 2015-01-11 ║ 3 ║ 9 ║ 1 ║ 1 ║
║ 10 ║ 2015-01-12 ║ 3 ║ 10 ║ 2 ║ 2 ║
╚════╩════════════╩═══════╩═════╩═════╩═══╝
To delete the rows you don't want, you just need to do:
DELETE FROM CTE2
WHERE N > 1;
Upvotes: 5