Reputation: 1827
We have done a translations api were our users can add translations to the system. A bug in the api added duplicate rows which i need to remove.
The translationConnection_Id combined with TranslationCompanyKey_Id is a key and therefor should not be able to be duplicate. I since i'm a real sucker at SQL i need some help to create a script to remove all duplicates but saves one of the rows.
SELECT TOP 1000 [Id]
,[Value]
,[TranslationConnection_Id]
,[TranslationCompanyKey_Id]
FROM [AAES_TRAN].[dbo].[Translations]
Upvotes: 1
Views: 76
Reputation: 117
I think you would like to keep one record from the duplicate records.
This code will delete duplicate records but will keep one records. Try this:
DELETE FROM TestTranslationTable WHERE ID IN(
SELECT Id FROM TestTranslationTable tblMain
WHERE tblMain.Id NOT IN (SELECT TOP 1 tblNested.id FROM TestTranslationTable tblNested WHERE
tblNested.TranslationConnectionId=tblMain.TranslationConnectionId ORDER BY tblNested.TranslationConnectionId)
AND tblMain.TranslationConnectionId IN(SELECT TranslationConnectionId FROM TestTranslationTable
GROUP BY TranslationConnectionId HAVING COUNT(TranslationConnectionId)>1 )
)
GO
Upvotes: 0
Reputation: 77934
You can use ROW_NUMBER()
analytic function for this like
SELECT * FROM (
SELECT TOP 1000 [Id]
,[Value]
,[TranslationConnection_Id]
,[TranslationCompanyKey_Id]
,ROW_NUMBER() OVER(PARTITION BY [Value] ORDER BY [Value]) AS rn
FROM [AAES_TRAN].[dbo].[Translations] ) xxx
WHERE rn = 1;
Upvotes: 0
Reputation: 2878
I think this will work. Try the SELECT part first to be sure.
DELETE FROM tblTranslations WHERE ID IN (
SELECT MAX(ID) FROM tblTranslations
GROUP BY TranslationConnection_Id, TranslationCompanyKey_Id
HAVING COUNT(*) > 1
)
Upvotes: 5