Reputation: 17
I have a SQL question which is to delete some rows from a table. The structure of the table is like a paired rows. It can be expressed in the following SQL:
create table #test
(
col1 int, col2 int, col3 int, id char(1), dtime datetime
)
insert into #test
values
(1,1,1,'a','2015-02-01 1:00:00')
,(1,1,1,'b','2015-02-01 1:00:01')
,(2,1,1,'a','2015-02-01 1:00:00')
,(2,1,1,'b','2015-02-01 1:00:01')
,(3,1,3,'b','2015-02-01 1:00:00') -- Remove this row
,(3,1,3,'a','2015-02-01 1:00:03')
,(3,1,3,'b','2015-02-01 1:00:04')
,(4,2,1,'a','2015-02-01 3:00:00')
,(4,2,1,'b','2015-02-01 3:00:01')
,(5,3,1,'a','2015-02-01 4:00:00')
,(5,3,1,'b','2015-02-01 4:00:01')
,(5,6,3,'b','2015-02-01 4:00:00') -- Remove this row
,(5,6,3,'a','2015-02-01 4:00:03')
,(5,6,3,'b','2015-02-01 4:00:04')
select *
from #test
order by col1,col2,col3
drop table #test
Sorry, I have to make it clear. This question is from a real dataflow. The data is about a workflow steps. It has a start time, and a complete time. Each step might have multiple rows (because the step is called multiple times). When I choose a begin time and end time to get the dataflow, you can expect some steps is cut at the complete time instead of the start time which I want.
The query is to remove the unpaired rows that begin with complete time.
As you seen, every two rows should consist of a column of 'a' and 'b', and start with 'a' -- the start time. But those two rows to be deleted (actually we do not know how many they are) starts with 'b'-- the complete time.
Upvotes: 0
Views: 478
Reputation: 12538
WITH Ordered AS
(
SELECT Col1, col2, col3, id, dtime,
ROW_NUMBER() OVER(PARTITION BY col1, col2, col3, id ORDER BY dtime DESC) AS Pos
FROM #test
)
--SELECT a.*, b.Pos
DELETE a
FROM #test AS a
INNER JOIN Ordered AS b ON a.col1 = b.col1 AND a.col2 = b.col2 AND a.col3 = b.col3
AND a.ID = b.ID AND a.dtime = b.dtime
AND b.Pos <> 1
This will remove all but the most recent of each duplicate.
Upvotes: 0
Reputation: 17020
Having a primary key makes deletions much easier. Adding one would be the ideal solution.
Without a primary key or some other unique constraint, there could be duplicate rows. The datetime column does not guarantee that the data is unique.
If there are duplicates, do you want all duplicate rows deleted? If so, you can delete them specifying all of the columns:
delete from #Test
where col 1 = 3
and col2 = 1
and col3 = 3
and id = 'b'
and dtime = '2015-02-01 1:00:00'
delete from #Test
where col 1 = 5
and col2 = 6
and col3 = 3
and id = 'b'
and dtime = 2015-02-01 4:00:00'
If you want all but one of the potential duplicates removed, you would have to number them and delete all matching rows after the first row.
Upvotes: 1
Reputation: 1258
As said, if there's not a primary key set, you have to tell it every value that makes it different from other. In this case:
DELETE FROM #test WHERE dtime ='2015-02-01 1:00:00' AND id = 'b' AND col1 = 3 AND col2 = 1 AND col3 = 3
But I warn you this isn't a good practice. You should set a primary key as you've already told.
Upvotes: 0
Reputation: 1455
You can not delete a specific row with non-unique values. So, you have to ad an id-column (primary key!)
Upvotes: 0