Reputation: 1860

deleting duplicate row with no unique identifier

I have some data in a table that looks roughly like the following:

table stockData
(
tickId int not null,
timestamp datetime not null,
price decimal(18,5) not null
)

Neither tickId nor timestamp are unique, however the combination of tickId and timestamp is supposed to be unique.

I have some duplicate data in my table, and I'm attempting to remove it. However, I'm coming to the conclusion that there is not enough information with the given data for me to discern one row from the other, and basically no way for me to delete just one of the duplicate rows. My guess is that I will need to introduce some sort of identity column, which would help me to identify one row from the other.

Is this correct, or is there some magic way of deleting one but not both of the duplicate data with a query?

EDIT Edited to clarify that tickId and timestamp combo should be unique, but it's not because of the duplicate data.

Upvotes: 7

Answers (3)

NuNn DaDdY

Reputation: 2992

Maybe I'm not understanding your question correctly, but if "tickId" and "timestamp" are guaranteed to be unique then how do you have duplicate data in your table? Could you provide an example or two of what you mean?

However, if you have duplicates of all three columns inside the table the following script may work. Please test this and make a backup of the database before implementing as I just put it together.

declare @x table 
(
    tickId int not null,
    timestamp datetime not null,
    price decimal(18,5) not null
)

insert into @x (tickId, timestamp, price)
select tickId,
    timestamp,
    price
from stockData
group by tickId,
         timestamp,
         price
having count(*) > 1

union 

select tickId,
       timestamp,
       price
from stockData
group by tickId,
         timestamp,
         price
having count(*) = 1

delete 
from stockData

insert into stockData (tickId, timestamp, price)
select tickId,
       timestamp,
       price
from @x

alter table stockData add constraint
    pk_StockData primary key clustered (tickid, timestamp)

Upvotes: 0

Sergey Rybalkin

Reputation: 3026

Here is a query that will remove duplicates and leave exactly one copy of each unique row. It will work with SQL Server 2005 or higher:

WITH Dups AS
(
  SELECT tickId, timestamp, price,
    ROW_NUMBER() OVER(PARTITION BY tickid, timestamp ORDER BY (SELECT 0)) AS rn
  FROM stockData
)
DELETE FROM Dups WHERE rn > 1

Upvotes: 29

Jester

Reputation: 3317

select distinct * into temp_table from source_table (this table will be created for you)

delete from temp_table (what you don't need)

insert into sorce_table
select * from temp_table

Upvotes: 4

deleting duplicate row with no unique identifier

Answers (3)

Related Questions