ondrovic
ondrovic

Reputation: 1175

Duplicate rows with information merged and then remove duplicates

I have a table called customers which gets info populated from a form not all fields are required (this is because the form generates asf / xml with the inputted info) and I would like to be able to merge duplicates into one row then delete the duplicates.

Here is my table

CID | LastName | FirstName | Street | City | ZipCode | HomePhone | CellPhone | EmailAddr
 1     Test       NULL         NULL   NULL   NULL       NULL        NULL         NULL
 2     NULL       TEST         NULL   NULL   NULL       NULL        NULL         NULL
 3     NULL       NULL         Test   NULL   NULL       NULL        NULL         NULL
 4     NULL       NULL         NULL   Test   NULL       NULL        NULL         NULL
 5     NULL       NULL         NULL   NULL   Test       NULL        NULL         NULL
 6     NULL       NULL         NULL   NULL   NULL       Test        NULL         NULL
 7     NULL       NULL         NULL   NULL   NULL       NULL        TEST         NULL
 8     NULL       NULL         NULL   NULL   NULL       NULL        NULL         TEST

I want to merge the data from each field that isn't null into the Fist Instance of then update that record and delete the remaining 7 records.

I am still starting out in SQL but understand joins, inserts, updates deletes etc. Any advice or direction would be greatly appreciated. I have found multiple posts where I can merge this data in a report but not to many where I can actually truly merge the data and delete the duplicate rows.

I just found this post while searching so it may be what I am looking for mysql-consolidate-duplicate-data-records-via-update-delete

Upvotes: 1

Views: 3363

Answers (2)

Devart
Devart

Reputation: 121902

Try this one -

SET NOCOUNT ON;

DECLARE @temp TABLE
(
      CID INT PRIMARY KEY
    , LastName NVARCHAR(10)
    , FirstName NVARCHAR(10)
    , Street NVARCHAR(10)
    , City NVARCHAR(10)
    , ZipCode NVARCHAR(10)
    , HomePhone NVARCHAR(10)
    , CellPhone NVARCHAR(10)
    , EmailAddr NVARCHAR(10)
)

INSERT INTO @temp (CID, LastName, FirstName, Street, City, ZipCode, HomePhone, CellPhone, EmailAddr)
VALUES 
    (1,  'Test', NULL,   NULL,   NULL,   NULL,   NULL, NULL, NULL),
    (2,  NULL,   'TEST', NULL,   NULL,   NULL,   NULL, NULL, NULL),
    (3,  NULL,   NULL,   'Test', NULL,   NULL,   NULL, NULL, NULL),
    (4,  NULL,   NULL,   NULL,   'Test', NULL,   NULL, NULL, NULL),
    (5,  NULL,   NULL,   NULL,   NULL,   'Test', NULL, NULL, NULL),
    (6,  NULL,   NULL,   NULL,   NULL,   NULL,   'Test', NULL, NULL),
    (7,  NULL,   NULL,   NULL,   NULL,   NULL,   NULL, 'TEST', NULL),
    (8,  NULL,   NULL,   NULL,   NULL,   NULL,   NULL, NULL, 'TEST'),
    (12, 'Tes2',  NULL,  NULL,   NULL,   NULL,   NULL, NULL, NULL),
    (14, NULL,   'TES2', NULL,   NULL,   NULL,   NULL, NULL, NULL),
    (17, NULL,   NULL,   'Tes2', NULL,   NULL,   NULL, NULL, NULL),
    (18, 'Tes3', NULL,   NULL,   NULL,   NULL,   NULL, NULL, NULL),
    (19, NULL,   'TES3', NULL,   NULL,   NULL,   NULL, NULL, NULL),
    (20, NULL,   NULL,   'Tes3', NULL,   NULL,   NULL, NULL, NULL),
    (21, NULL,   NULL,   NULL,   'Test3', NULL,   NULL, NULL, NULL)

DECLARE @buffer_temp TABLE
(
      CID INT PRIMARY KEY
    , LastName NVARCHAR(50)
    , FirstName NVARCHAR(50)
    , Street NVARCHAR(50)
    , City NVARCHAR(50)
    , ZipCode NVARCHAR(50)
    , HomePhone NVARCHAR(50)
    , CellPhone NVARCHAR(50)
    , EmailAddr NVARCHAR(50)
)

;WITH cte AS 
(
    SELECT t.CID, NextCID = ISNULL(t2.CID, (SELECT MAX(y.CID) FROM @temp y))  
    FROM @temp t
    OUTER APPLY (
        SELECT TOP 1 CID = t1.CID - 1
        FROM @temp t1
        WHERE t1.CID > t.CID
            AND t1.LastName IS NOT NULL
    ) t2
    WHERE t.LastName IS NOT NULL
)
INSERT INTO @buffer_temp
SELECT 
      t2.CID
    , LastName = MAX(LastName) 
    , FirstName = MAX(FirstName)
    , Street = MAX(Street)
    , City = MAX(City)
    , ZipCode = MAX(ZipCode)
    , HomePhone = MAX(HomePhone)
    , CellPhone = MAX(CellPhone)
    , EmailAddr = MAX(EmailAddr) 
FROM @temp t
CROSS APPLY (
    SELECT *
    FROM cte t2
    WHERE t.CID BETWEEN t2.CID AND t2.NextCID
) t2
GROUP BY t2.CID

DELETE FROM @temp

INSERT INTO @temp
SELECT * 
FROM @buffer_temp

SELECT * 
FROM @temp

Output:

CID         LastName   FirstName  Street     City       ZipCode    HomePhone  CellPhone  EmailAddr
----------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
1           Test       TEST       Test       Test       Test       Test       TEST       TEST
12          Tes2       TES2       Tes2       NULL       NULL       NULL       NULL       NULL
18          Tes3       TES3       Tes3       Test3      NULL       NULL       NULL       NULL

Upvotes: 1

Gordon Linoff
Gordon Linoff

Reputation: 1269563

It looks like you want to merge records 1-8, then 9-16, then 17-24, and so on.

Fortunately, you have a CID field that you can use for identifying the groups. All you need is the group, and the formula (CID - 1)/8 does the trick (SQL Server does integer division when dividing integers so, say, 4/8 = 0 and not 0.5). Here is the query:

select (CID - 1) / 8 as NewCID,
       max(LastName) as LastName, max(FirstName) as FirstName, . . . 
from t
group by (CID - 1) / 8;

Upvotes: 0

Related Questions