VM1
VM1

Reputation: 399

Assign a group name based on identical groups of rows using SQL Server

I have a table (might end up using SQL Server) that looks like this:

enter image description here

I would like to put products into a final grouping based on same criteria.

For example prod-A, prod-B, prod-X all belong to same group as their criteria rows are same (can't be partial match).

Hence I would like the output to be something like this

enter image description here

Any help will be appreciated. Thanks!

Upvotes: 0

Views: 1274

Answers (2)

kjmerf
kjmerf

Reputation: 4345

This one is possibly the most inefficient query ever written BUT it does do the job:

SELECT t.product, t.criteria, final.final_grouping
FROM t
INNER JOIN
(SELECT p.product, groups.final_grouping
FROM
(SELECT DISTINCT product, STUFF(
             (SELECT DISTINCT ',' + criteria 
              FROM t sub
              WHERE t.product = sub.product
              FOR XML PATH ('')
              )
             , 1, 1, '') AS criterias
FROM t
GROUP BY product) p
INNER JOIN
(SELECT ROW_NUMBER() OVER(ORDER BY criterias) AS final_grouping, c.criterias
FROM
(SELECT DISTINCT STUFF(
             (SELECT DISTINCT ',' + criteria 
              FROM t sub
              WHERE t.product = sub.product
              FOR XML PATH ('')
              )
             , 1, 1, '') AS criterias
FROM t) c) groups
ON p.criterias = groups.criterias) final
ON t.product = final.product

You can validate here: http://rextester.com/SZUECY8704

Upvotes: 0

Gordon Linoff
Gordon Linoff

Reputation: 1269873

In SQL Server, this is trickier. You can use similar logic as in Postgres, but the hack for using XML is yuckier.

Instead, let's start by getting all pairs of exact matches:

with i as (
      select i.*, count(*) over (partition by product) as nump
      from input
     )
select i1.product, i2.product
from i i1 join
     i i2
     on i1.criteria = i2.criteria and
        i1.nump = i2.nump
group by i1.product, i2.product
having count(*) = i1.nump;  -- everything matches

From here, we can aggregate to get the ranking:

with i as (
      select i.*, count(*) over (partition by product) as nump
      from input
     )
select i1.product, min(i2.product) as grp_product,
       dense_rank() over (order by min(i2.product)) as grp
from (select i1.product, i2.product
      from i i1 join
           i i2
           on i1.criteria = i2.criteria and
              i1.nump = i2.nump
      group by i1.product, i2.product
      having count(*) = i1.nump  -- everything matches
     ) ii
group by i1.product;

For each product, this now assigns the group number.

You can use a join to assign the value to each row.

Although you can use the same technique in Postgres, I think array aggregation is a simpler approach.

Upvotes: 2

Related Questions