Find duplicate groups of rows in SQL Server

Question

I have a table with materials information where one material has from one to many constituents.

The table looks like this:

material_id contstiuent_id constituent_wt_pct
   1             1              10.5
   1             2              89.5
   2             1              10.5
   2             5              15.5
   2             7              74
   3             1              10.5
   3             2              89.5

Generally, I can have different material ID's with the same constituents (both ID's and weight percent), but also the same constituent id with the same weight percent can be in multiple materials.

I need to find the material ID's that have exactly the same amount of constituents, same constituents id's and same weight percent (in the example of data that will be material ID 1 and 3) What would be great is to have the output like:

ID Duplicate ID's
1 1,3
2 15,25
....

Just to clarify the question: I have several thousands of materials and it won't help me if I get just the id's of duplicate rows - I would like to see if it is possible to get the groups of duplicate material id's in the same row or field.

Mikael Eriksson · Accepted Answer

Build a XML string in a CTE that contains all constituents and use that string to figure out what materials is duplicate.

SQL Fiddle

MS SQL Server 2008 Schema Setup:

create table Materials
(
  material_id int, 
  constituent_id int, 
  constituent_wt_pct decimal(10, 2)
);


insert into Materials values
(1, 1, 10.5),
(1, 2, 89.5),
(2, 1, 10.5),
(2, 5, 15.5),
(2, 7, 74),
(3, 1, 10.5),
(3, 2, 89.5);

Query 1:

with C as
(
  select M1.material_id,
        (
        select M2.constituent_id as I,
                M2.constituent_wt_pct as P
        from Materials as M2
        where M1.material_id = M2.material_id
        order by M2.constituent_id,
                 M2.material_id
        for xml path('')
        ) as constituents
  from Materials as M1
  group by M1.material_id
)
select row_number() over(order by 1/0) as ID,
       stuff((
       select ','+cast(C2.material_id as varchar(10))
       from C as C2
       where C1.constituents = C2.constituents
       for xml path('')
       ), 1, 1, '') as MaterialIDs
from C as C1
group by C1.constituents
having count(*) > 1

Results:

| ID | MATERIALIDS |
--------------------
|  1 |         1,3 |

Find duplicate groups of rows in SQL Server

Answers (2)

Related Questions