user1816353
user1816353

Reputation: 17

SQL Server delete a specific row

I have a SQL question which is to delete some rows from a table. The structure of the table is like a paired rows. It can be expressed in the following SQL:

create table #test 
(
    col1 int, col2 int, col3 int, id char(1), dtime datetime
)

insert into #test 
values 
 (1,1,1,'a','2015-02-01 1:00:00')
,(1,1,1,'b','2015-02-01 1:00:01')
,(2,1,1,'a','2015-02-01 1:00:00')                        
,(2,1,1,'b','2015-02-01 1:00:01')
,(3,1,3,'b','2015-02-01 1:00:00') -- Remove this row
,(3,1,3,'a','2015-02-01 1:00:03')
,(3,1,3,'b','2015-02-01 1:00:04')
,(4,2,1,'a','2015-02-01 3:00:00')
,(4,2,1,'b','2015-02-01 3:00:01')
,(5,3,1,'a','2015-02-01 4:00:00')                        
,(5,3,1,'b','2015-02-01 4:00:01')
,(5,6,3,'b','2015-02-01 4:00:00') -- Remove this row
,(5,6,3,'a','2015-02-01 4:00:03')
,(5,6,3,'b','2015-02-01 4:00:04')

select * 
from #test 
order by col1,col2,col3

drop table #test

Sorry, I have to make it clear. This question is from a real dataflow. The data is about a workflow steps. It has a start time, and a complete time. Each step might have multiple rows (because the step is called multiple times). When I choose a begin time and end time to get the dataflow, you can expect some steps is cut at the complete time instead of the start time which I want.

The query is to remove the unpaired rows that begin with complete time.

As you seen, every two rows should consist of a column of 'a' and 'b', and start with 'a' -- the start time. But those two rows to be deleted (actually we do not know how many they are) starts with 'b'-- the complete time.

Upvotes: 0

Views: 478

Answers (4)

MartW
MartW

Reputation: 12538

WITH Ordered AS
(
    SELECT Col1, col2, col3, id, dtime,
        ROW_NUMBER() OVER(PARTITION BY col1, col2, col3, id ORDER BY dtime DESC) AS Pos
    FROM #test
)
--SELECT a.*, b.Pos
DELETE a
FROM #test AS a
INNER JOIN Ordered AS b ON a.col1 = b.col1 AND a.col2 = b.col2 AND a.col3 = b.col3
    AND a.ID = b.ID AND a.dtime = b.dtime
    AND b.Pos <> 1

This will remove all but the most recent of each duplicate.

Upvotes: 0

Paul Williams
Paul Williams

Reputation: 17020

Having a primary key makes deletions much easier. Adding one would be the ideal solution.

Without a primary key or some other unique constraint, there could be duplicate rows. The datetime column does not guarantee that the data is unique.

If there are duplicates, do you want all duplicate rows deleted? If so, you can delete them specifying all of the columns:

delete from #Test
where col 1 = 3
and col2 = 1
and col3 = 3
and id = 'b'
and dtime = '2015-02-01 1:00:00'

delete from #Test
where col 1 = 5
and col2 = 6
and col3 = 3
and id = 'b'
and dtime = 2015-02-01 4:00:00'

If you want all but one of the potential duplicates removed, you would have to number them and delete all matching rows after the first row.

Upvotes: 1

Pinx0
Pinx0

Reputation: 1258

As said, if there's not a primary key set, you have to tell it every value that makes it different from other. In this case:

DELETE FROM #test WHERE dtime ='2015-02-01 1:00:00' AND id = 'b' AND col1 = 3 AND col2 = 1 AND col3 = 3

But I warn you this isn't a good practice. You should set a primary key as you've already told.

Upvotes: 0

swe
swe

Reputation: 1455

You can not delete a specific row with non-unique values. So, you have to ad an id-column (primary key!)

Upvotes: 0

Related Questions