James
James

Reputation: 572

SQL Server query to eliminate duplicates based on single column where another column may differ

My table looks like this:

Supplier        ReferenceID         Description        Total    
------------------------------------------------------------
smiths          BP657869510L        Order 67543         42
smiths          BP657869510L        Order 67543B        42
smiths          BP654669517L        No. 5621            13
smiths          BP654669517L        No. 56211           13
corrigan        15:51               Order 23542         23
corrigan        15:51               Order 235422        23
williams        14015               Block B             19
williams        14015               Block B2            19

I would like to write a T-SQL query to return the list of transactions with each supplier, eliminating duplicate entries based on the ReferenceID column. As you can see from the table, the Description value may be different in two columns with the same ReferenceID (due to data entry error). In this case, if possible, I would like to return one of these Description values (I don't care which one).

So the results I would want to return based on the data above would be (I picked the Description values randomly - I don't have a preference as to which one is returned as long as it is tied to the ReferenceID in the original table.)

    Supplier     ReferenceID        Description        Total    
    --------------------------------------------------------
    smiths       BP657869510L       Order 67543         42
    smiths       BP654669517L       No. 5621            13
    corrigan     15:51              Order 23542         23
    williams     14015              Block B             19

I realise this is quite complex but any suggestions appreciated!

Upvotes: 0

Views: 128

Answers (4)

mohan111
mohan111

Reputation: 8865

DECLARE @TableName TABLE 
    ([Supplier] varchar(8), [ReferenceID] varchar(12), [Description] varchar(12), [Total] int)
;

INSERT INTO @TableName
    ([Supplier], [ReferenceID], [Description], [Total])
VALUES
    ('smiths', 'BP657869510L', 'Order 67543', 42),
    ('smiths', 'BP657869510L', 'Order 67543B', 42),
    ('smiths', 'BP654669517L', 'No. 5621', 13),
    ('smiths', 'BP654669517L', 'No. 56211', 13),
    ('corrigan', '15:51', 'Order 23542', 23),
    ('corrigan', '15:51', 'Order 235422', 23),
    ('williams', '14015', 'Block B', 19),
    ('williams', '14015', 'Block B2', 19)
;

;WITH CTE AS
(
Select distinct t.Supplier,tt.ReferenceID,t.[Description],Total
 from @TableName t
INNER JOIN 
(Select distinct MIN(Supplier)as r,ReferenceID,MIN([Description]) as k
 From @TableName
GROUP BY ReferenceID )tt
ON tt.ReferenceID = t.ReferenceID
AND t.Supplier = tt.r
AND t.Description = tt.k
GROUP BY t.Supplier,t.[Description],tt.ReferenceID,t.Total
)
Select C.Supplier,C.ReferenceID,C.Description,SUM(C.Total) FROM CTE C
GROUP BY C.Supplier,C.ReferenceID,C.Description
ORDER BY C.Supplier 

Upvotes: 0

Bob
Bob

Reputation: 1055

If you want to pull one record per ReferenceID while maintaining the integrity of the record, then you can use dense_rank() like this.

WITH Table_CTE (Supplier, ReferenceID, Description, Total, RefRank) AS
(
  SELECT Supplier, ReferenceID, Description, Total,
         DENSE_RANK() OVER(PARTITION BY ReferenceID ORDER BY Description) As RefRank
  FROM Table
)
SELECT Supplier, ReferenceID, Description, Total
FROM Table_CTE
WHERE RefRank = 1

Using GROUP BY as suggested by others might be a simpler solution.

Upvotes: 0

Daniel E.
Daniel E.

Reputation: 2059

SELECT Supplier
    ,ReferenceID
    ,MAX(Description) [Description]
    ,SUM(TOTAL) [TOTAL] --or MIN/MAX/AVG etc...
FROM TABLE
GROUP BY Supplier
    ,ReferenceID

Upvotes: 0

Tim Schmelter
Tim Schmelter

Reputation: 460098

You just need to GROUP BY ReferenceID, if the rest is arbitrary/equal you can use MAX or MIN:

SELECT Supplier = MIN(Supplier),
       ReferenceID,        
       Description = MIN(Description),        
       Total = MIN(Total)
FROM dbo.Tablename
GROUP BY ReferenceID

Demo

Upvotes: 2

Related Questions