Reputation: 41
I have a many-many relationship table, and I want to find the overlapped groups and merge them into one.
In the example below, user 2
is in groups 7
and 8
, so groups 7
and 8
should be merged into one that contains the records 1
, 2
, 4
. The merged group id can be either 7
or 8
, it doesn't matter.
user_id | group |
---|---|
1 | 7 |
2 | 7 |
2 | 8 |
4 | 8 |
5 | 9 |
6 | 9 |
I wish to see output like this:
user_id | group |
---|---|
1 | 7 |
2 | 7 |
4 | 7 |
5 | 9 |
6 | 9 |
Upvotes: 0
Views: 428
Reputation: 2746
This is similar to the one asked earlier, where-in grouping needs to be done to the top level in hierarchy.
The below query aggregates user_id based on group_id into array and then compares those arrays with each other. When two arrays match they both get same group id. Once arrays match and they have been assigned their parent group id based on minimum group value, we need to get the top of the hierarchy. There could also be multiple hierarchies in the data-set, so we set starting point of each hierarchy as NULL. Lastly, we use hierarchical query to get the final grouping.
with data(user_id,group_id) as (
select * from values
(1,7),(2,7),(2,8),(4,9),(5,9),(5,8),
(6,9),(70,8),(21,51),(22,51),(23,52),
(24,51),(24,52),(25,26)
),cte_1 as
(select group_id,array_agg(user_id) arr
from data
group by group_id
), cte_2 as
(select c1.group_id g1, c2.group_id g2 ,
c1.arr arr1, c2.arr arr2,
case when arrays_overlap(arr1, arr2) then g1 end flag,
min(flag) over (partition by g2) grp,
case when g2 <> grp then grp end final_grp
from cte_1 c1, cte_1 c2
), cte_3 as
(select distinct g2, connect_by_root g2 as parent from cte_2
start with final_grp is null
connect by final_grp = prior g2
order by g2
), cte_4 as
(select c3.parent, c1.arr
from cte_1 c1 left join cte_3 c3
where c1.group_id = c3.g2
) select distinct value, parent as final_group
from cte_4,
lateral flatten(input=>arr)
order by value;
VALUE | FINAL_GROUP |
---|---|
1 | 7 |
2 | 7 |
4 | 7 |
5 | 7 |
6 | 7 |
21 | 51 |
22 | 51 |
23 | 51 |
24 | 51 |
25 | 26 |
70 | 7 |
Adding another query, that is simpler.
with data(user_id,group_id) as (
select * from values
(1,7),(2,7),(2,8),(4,9),(5,9),(5,8),
(6,9),(70,8),(21,51),(22,51),(22,52),
(22,53),(23,52),(25,26)
), cte_1 as
(select a.group_id grp1, b.group_id grp2
from data a, data b
where a.user_id = b.user_id
and a.group_id < b.group_id
), cte_2 as
(select grp2, connect_by_root grp1 as parent
from cte_1
start with grp1 not in (select grp2 from cte_1)
connect by grp1 = prior grp2
) select a.user_id,
coalesce(b.parent, a.group_id) final_grp
from data a left join cte_2 b
on a.group_id = b.grp2;
Upvotes: 2
Reputation: 41
Answering my own question here, below is the SQL I built that fits my needs. This is inspired by @pankaj 's answer.
with data(user_id,group_id) as (
select * from values
(1,7),(2,7),(2,8),(4,9),(5,9),(5,8),
(6,9),(70,8),(21,51),(22,51),(23,52),
(24,51),(24,52),(25,26)
), group_members as (
select
group_id, array_agg(user_id) users
from data
group by group_id
), overlapped_group as (
select
c1.group_id g1,
c2.group_id g2,
-- c1.users,
-- c2.users,
least(g1, coalesce(g2, g1)) as min_group,
min(min_group) over (partition by g2) as merge_to
from group_members c1
left join
group_members c2 on arrays_overlap(c1.users, c2.users)
and g1 <> g2
), merge_mapping as (
select distinct
g1 as group_id,
iff(g2 is null, g1, min(merge_to) over (partition by g1)) as merge_to
from overlapped_group
)
select
user_id,
m.merge_to as group_id
from data
left join merge_mapping m using(group_id);
Upvotes: 2
Reputation: 384
One way:
select user_id, STRTOK(listagg(group, ', ') within group (ORDER BY user_id ),',',1)
from <table>
GROUP BY user_id ORDER BY user_id;
Upvotes: 0