Reputation: 624
I have a table as shown below
record similar_record
rec_1 rec_2
rec_3 rec_4
rec_2 rec_3
rec_5 rec_7
The above data shows which two records are similar. Eg: In the above dataset, rec_1 is similar to rec_2, rec_2 is similar to rec_3 and rec_3 is similar to rec_4, hence they have to go to one group. rec_5 and rec_7 are similar hence they form a group. We have to generate the group identifiers, they need not be integers.
I am trying to write a SQL query on MySQL to generate the following output.
group record
1 rec_1
1 rec_2
1 rec_3
1 rec_4
2 rec_5
2 rec_7
The records need not be in a separate row, it is OK if the result is obtained by GROUP_CONCAT with some delimiter per group.
Can anyone help me with the query?
Upvotes: 1
Views: 57
Reputation: 31772
Here is kind of recursive brute force method. Works with MySQL 8. Should also work with MariaDB 10.2:
create table graph (
node1 varchar(50),
node2 varchar(50)
);
insert into graph (node1, node2) values
('rec_1', 'rec_2'),
('rec_3', 'rec_4'),
('rec_2', 'rec_3'),
('rec_5', 'rec_7');
with recursive numerated as (
select g.*, ROW_NUMBER() OVER (PARTITION BY null ORDER BY node1) as grp
from graph g
), normalized as (
select grp, node1 as node from numerated
union distinct
select grp, node2 as node from numerated
), rcte as (
select n.grp as grp1, n.*
from normalized n
union all
select rcte.grp1 as grp1, n2.grp, n2.node
from rcte
join normalized n1 on n1.node = rcte.node and n1.grp > rcte.grp
join normalized n2 on n2.node <> n1.node and n2.grp = n1.grp
), cte4 as (
select node, min(grp1) as grp1
from rcte
group by node
)
select DENSE_RANK() OVER (PARTITION BY null ORDER BY grp1) as grp, node
from cte4
order by grp, node;
Result:
grp | node
----|------
1 | rec_1
1 | rec_2
1 | rec_3
1 | rec_4
2 | rec_5
2 | rec_7
Demo: https://www.db-fiddle.com/f/wqhoqoNGEfZvFpVBHybUVx/0
Upvotes: 1