Pure.Krome
Pure.Krome

Reputation: 86957

How can i remove duplicates from this Sql Server Table?

I have a single table in the Sql Server 2008 r2 DB. Every few seconds I import data into this table. At one point, the import was failing, so it was constantly importing the same data, creating duplicates. (basically, if the import read 20 lines, imported 19 and failed on 20 .. then those 19 were not in a transaction .. and thus got inserted).

Anyways, I'm trying to figure out how I can I remove all the duplicates and just the first (original) inserted row?

Here's the table schema - and please note that there's a few nullable fields.

CREATE TABLE [dbo].[LogEntries](
    [LogEntryId] [int] IDENTITY(1,1) NOT NULL,
    [GameFileId] [int] NOT NULL,
    [CreatedOn] [datetimeoffset](7) NOT NULL,
    [EventTypeId] [tinyint] NOT NULL,
    [Message] [nvarchar](max) NULL,
    [Code] [int] NULL,
    [Violation] [nvarchar](100) NULL,
    [ClientName] [nvarchar](100) NULL,
    [ClientGuid] [nvarchar](50) NULL,
    [ClientGuidReversed] [nvarchar](50) NULL,
    [ClientIpAndPort] [nvarchar](50) NULL,
 CONSTRAINT [PK_LogEntries] PRIMARY KEY CLUSTERED 
(
    [LogEntryId] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]

Cheers :)

UPDATE: What are Duplicates (in this case?)

Damn sorry. forgot to define a duplicate. The LogEntryId is unique, so ignore that piece of info (it's not imported). all the rest of the data is imported. Here's two rows of data that are identical.

6459749 39  2010-11-05 00:00:25.0000000 +11:00  6   Violation (MULTIHACK) #70805    70805   MULTIHACK   angelb  aeda202c22ed41f7301d0673647c55d8    8d55c7463760d1037f14de22c202adea    220.246.157.194:57133
6459766 39  2010-11-05 00:00:25.0000000 +11:00  6   Violation (MULTIHACK) #70805    70805   MULTIHACK   angelb  aeda202c22ed41f7301d0673647c55d8    8d55c7463760d1037f14de22c202adea    220.246.157.194:57133

and to compare this to the top 5 ordered by desc

6505931 40  2010-11-08 23:39:16.0000000 +11:00  4   NULL    NULL    NULL    Zaphrolio   69ae1bfea616c244e5c223e51d5ceb8e    e8bec5d15e322c5e442c616aefb1ea96    175.38.209.80:10000
6505930 39  2010-11-08 23:39:04.0000000 +11:00  3   NULL    NULL    NULL    imBakedAsBro    8cf1b3b6a389229fa4adeec07dc087ce    ec780cd70ceeda4af922983a6b3b1fc8    110.175.83.45:10000
6505929 39  2010-11-08 23:39:03.0000000 +11:00  2   NULL    NULL    NULL    imBakedAsBro    NULL    NULL    110.175.83.45:10000
6505928 80  2010-11-08 23:39:04.0000000 +11:00  4   NULL    NULL    NULL    Asmo74  5ccf5ee85a6cf08da563bdcbfe75351d    d15357efbcdb365ad80fc6a58ee5fcc5    61.68.212.231:50273
6505927 80  2010-11-08 23:39:03.0000000 +11:00  4   NULL    NULL    NULL    McJellyfish c48218542918bec900a331a81e0a9d05    50d9a0e18a133a009ceb81924581284c    60.225.3.2:10000

Upvotes: 1

Views: 1746

Answers (2)

Remus Rusanu
Remus Rusanu

Reputation: 294277

with cte as (
 select row_number() over (
   partition by  
      [GameFileId]
    , [CreatedOn]
    , [EventTypeId]
    , [Message]
    , [Code]
    , [Violation]
    , [ClientName]
    , [ClientGuid]
    , [ClientGuidReversed]
    , [ClientIpAndPort]
 order by [LogEntryId]) as rn
 from LogEntries)
delete from cte
 where rn > 1;

Upvotes: 5

phatmanace
phatmanace

Reputation: 5021

in the abscence of any other information, UNION ALL is normally a good trick to blat out dupes

select * from table
union 
select * from table

Edited to reflect typo in comment...

Upvotes: 0

Related Questions