Reputation: 1175
I have a table called customers which gets info populated from a form not all fields are required (this is because the form generates asf / xml with the inputted info) and I would like to be able to merge duplicates into one row then delete the duplicates.
Here is my table
CID | LastName | FirstName | Street | City | ZipCode | HomePhone | CellPhone | EmailAddr
1 Test NULL NULL NULL NULL NULL NULL NULL
2 NULL TEST NULL NULL NULL NULL NULL NULL
3 NULL NULL Test NULL NULL NULL NULL NULL
4 NULL NULL NULL Test NULL NULL NULL NULL
5 NULL NULL NULL NULL Test NULL NULL NULL
6 NULL NULL NULL NULL NULL Test NULL NULL
7 NULL NULL NULL NULL NULL NULL TEST NULL
8 NULL NULL NULL NULL NULL NULL NULL TEST
I want to merge the data from each field that isn't null into the Fist Instance of then update that record and delete the remaining 7 records.
I am still starting out in SQL but understand joins, inserts, updates deletes etc. Any advice or direction would be greatly appreciated. I have found multiple posts where I can merge this data in a report but not to many where I can actually truly merge the data and delete the duplicate rows.
I just found this post while searching so it may be what I am looking for mysql-consolidate-duplicate-data-records-via-update-delete
Upvotes: 1
Views: 3363
Reputation: 121902
Try this one -
SET NOCOUNT ON;
DECLARE @temp TABLE
(
CID INT PRIMARY KEY
, LastName NVARCHAR(10)
, FirstName NVARCHAR(10)
, Street NVARCHAR(10)
, City NVARCHAR(10)
, ZipCode NVARCHAR(10)
, HomePhone NVARCHAR(10)
, CellPhone NVARCHAR(10)
, EmailAddr NVARCHAR(10)
)
INSERT INTO @temp (CID, LastName, FirstName, Street, City, ZipCode, HomePhone, CellPhone, EmailAddr)
VALUES
(1, 'Test', NULL, NULL, NULL, NULL, NULL, NULL, NULL),
(2, NULL, 'TEST', NULL, NULL, NULL, NULL, NULL, NULL),
(3, NULL, NULL, 'Test', NULL, NULL, NULL, NULL, NULL),
(4, NULL, NULL, NULL, 'Test', NULL, NULL, NULL, NULL),
(5, NULL, NULL, NULL, NULL, 'Test', NULL, NULL, NULL),
(6, NULL, NULL, NULL, NULL, NULL, 'Test', NULL, NULL),
(7, NULL, NULL, NULL, NULL, NULL, NULL, 'TEST', NULL),
(8, NULL, NULL, NULL, NULL, NULL, NULL, NULL, 'TEST'),
(12, 'Tes2', NULL, NULL, NULL, NULL, NULL, NULL, NULL),
(14, NULL, 'TES2', NULL, NULL, NULL, NULL, NULL, NULL),
(17, NULL, NULL, 'Tes2', NULL, NULL, NULL, NULL, NULL),
(18, 'Tes3', NULL, NULL, NULL, NULL, NULL, NULL, NULL),
(19, NULL, 'TES3', NULL, NULL, NULL, NULL, NULL, NULL),
(20, NULL, NULL, 'Tes3', NULL, NULL, NULL, NULL, NULL),
(21, NULL, NULL, NULL, 'Test3', NULL, NULL, NULL, NULL)
DECLARE @buffer_temp TABLE
(
CID INT PRIMARY KEY
, LastName NVARCHAR(50)
, FirstName NVARCHAR(50)
, Street NVARCHAR(50)
, City NVARCHAR(50)
, ZipCode NVARCHAR(50)
, HomePhone NVARCHAR(50)
, CellPhone NVARCHAR(50)
, EmailAddr NVARCHAR(50)
)
;WITH cte AS
(
SELECT t.CID, NextCID = ISNULL(t2.CID, (SELECT MAX(y.CID) FROM @temp y))
FROM @temp t
OUTER APPLY (
SELECT TOP 1 CID = t1.CID - 1
FROM @temp t1
WHERE t1.CID > t.CID
AND t1.LastName IS NOT NULL
) t2
WHERE t.LastName IS NOT NULL
)
INSERT INTO @buffer_temp
SELECT
t2.CID
, LastName = MAX(LastName)
, FirstName = MAX(FirstName)
, Street = MAX(Street)
, City = MAX(City)
, ZipCode = MAX(ZipCode)
, HomePhone = MAX(HomePhone)
, CellPhone = MAX(CellPhone)
, EmailAddr = MAX(EmailAddr)
FROM @temp t
CROSS APPLY (
SELECT *
FROM cte t2
WHERE t.CID BETWEEN t2.CID AND t2.NextCID
) t2
GROUP BY t2.CID
DELETE FROM @temp
INSERT INTO @temp
SELECT *
FROM @buffer_temp
SELECT *
FROM @temp
Output:
CID LastName FirstName Street City ZipCode HomePhone CellPhone EmailAddr
----------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
1 Test TEST Test Test Test Test TEST TEST
12 Tes2 TES2 Tes2 NULL NULL NULL NULL NULL
18 Tes3 TES3 Tes3 Test3 NULL NULL NULL NULL
Upvotes: 1
Reputation: 1269563
It looks like you want to merge records 1-8, then 9-16, then 17-24, and so on.
Fortunately, you have a CID
field that you can use for identifying the groups. All you need is the group, and the formula (CID - 1)/8
does the trick (SQL Server does integer division when dividing integers so, say, 4/8 = 0 and not 0.5). Here is the query:
select (CID - 1) / 8 as NewCID,
max(LastName) as LastName, max(FirstName) as FirstName, . . .
from t
group by (CID - 1) / 8;
Upvotes: 0