Reputation: 86957
I have a single table in the Sql Server 2008 r2 DB. Every few seconds I import data into this table. At one point, the import was failing, so it was constantly importing the same data, creating duplicates. (basically, if the import read 20 lines, imported 19 and failed on 20 .. then those 19 were not in a transaction .. and thus got inserted).
Anyways, I'm trying to figure out how I can I remove all the duplicates and just the first (original) inserted row?
Here's the table schema - and please note that there's a few nullable fields.
CREATE TABLE [dbo].[LogEntries](
[LogEntryId] [int] IDENTITY(1,1) NOT NULL,
[GameFileId] [int] NOT NULL,
[CreatedOn] [datetimeoffset](7) NOT NULL,
[EventTypeId] [tinyint] NOT NULL,
[Message] [nvarchar](max) NULL,
[Code] [int] NULL,
[Violation] [nvarchar](100) NULL,
[ClientName] [nvarchar](100) NULL,
[ClientGuid] [nvarchar](50) NULL,
[ClientGuidReversed] [nvarchar](50) NULL,
[ClientIpAndPort] [nvarchar](50) NULL,
CONSTRAINT [PK_LogEntries] PRIMARY KEY CLUSTERED
(
[LogEntryId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
Cheers :)
Damn sorry. forgot to define a duplicate. The LogEntryId is unique, so ignore that piece of info (it's not imported). all the rest of the data is imported. Here's two rows of data that are identical.
6459749 39 2010-11-05 00:00:25.0000000 +11:00 6 Violation (MULTIHACK) #70805 70805 MULTIHACK angelb aeda202c22ed41f7301d0673647c55d8 8d55c7463760d1037f14de22c202adea 220.246.157.194:57133
6459766 39 2010-11-05 00:00:25.0000000 +11:00 6 Violation (MULTIHACK) #70805 70805 MULTIHACK angelb aeda202c22ed41f7301d0673647c55d8 8d55c7463760d1037f14de22c202adea 220.246.157.194:57133
and to compare this to the top 5 ordered by desc
6505931 40 2010-11-08 23:39:16.0000000 +11:00 4 NULL NULL NULL Zaphrolio 69ae1bfea616c244e5c223e51d5ceb8e e8bec5d15e322c5e442c616aefb1ea96 175.38.209.80:10000
6505930 39 2010-11-08 23:39:04.0000000 +11:00 3 NULL NULL NULL imBakedAsBro 8cf1b3b6a389229fa4adeec07dc087ce ec780cd70ceeda4af922983a6b3b1fc8 110.175.83.45:10000
6505929 39 2010-11-08 23:39:03.0000000 +11:00 2 NULL NULL NULL imBakedAsBro NULL NULL 110.175.83.45:10000
6505928 80 2010-11-08 23:39:04.0000000 +11:00 4 NULL NULL NULL Asmo74 5ccf5ee85a6cf08da563bdcbfe75351d d15357efbcdb365ad80fc6a58ee5fcc5 61.68.212.231:50273
6505927 80 2010-11-08 23:39:03.0000000 +11:00 4 NULL NULL NULL McJellyfish c48218542918bec900a331a81e0a9d05 50d9a0e18a133a009ceb81924581284c 60.225.3.2:10000
Upvotes: 1
Views: 1746
Reputation: 294277
with cte as (
select row_number() over (
partition by
[GameFileId]
, [CreatedOn]
, [EventTypeId]
, [Message]
, [Code]
, [Violation]
, [ClientName]
, [ClientGuid]
, [ClientGuidReversed]
, [ClientIpAndPort]
order by [LogEntryId]) as rn
from LogEntries)
delete from cte
where rn > 1;
Upvotes: 5
Reputation: 5021
in the abscence of any other information, UNION ALL is normally a good trick to blat out dupes
select * from table
union
select * from table
Edited to reflect typo in comment...
Upvotes: 0