Reputation: 53
I need to find matched pairs of records in SQL Server, but each record can only be included in 1 pair. Once a record has been matched with a pair, it should be removed from consideration for any future pairs.
I have tried solutions involving ROW_NUMBER()
and LEAD()
, but i just can't quite get there.
This will be used to pair financial accounts with similar accounts for review, based on multiple customer attributes such as credit score, income, etc.
Statement:
declare @test table (ID numeric, Color varchar(20))
insert into @test values
(1,'Blue'),(2,'Red'),(3,'Blue'),(4,'Yellow'),(5,'Blue'),(6,'Red')
select*
from @test t1
join @test t2
on t1.Color = t2.Color
and t1.ID < t2.ID -----removes reverse-pairs and self-pairs
Current results:
ID Color ID Color
--- ------- --- --------
1 Blue 3 Blue
1 Blue 5 Blue -----should not appear because 1 has already been paired
3 Blue 5 Blue -----should not appear because 3 and 5 have already been paired
2 Red 6 Red
Needed results:
ID Color ID Color
--- ------- --- --------
1 Blue 3 Blue
2 Red 6 Red
Upvotes: 5
Views: 809
Reputation: 5922
Editing with Max comments
Here is one way to get this done..
I first rank the records on the basis of color with the lowest id with rnk=1, next one with rnk=2.
After that i join the tables together by pulling the rnk=1 records and joining then with rnk=2.
declare @test table (ID numeric, Color varchar(20))
insert into @test values
(1,'Blue'),(2,'Red'),(3,'Blue'),(4,'Yellow'),(5,'Blue'),(6,'Red'),(7,'Blue')
;with data
as (select row_number() over(partition by color order by id asc) as rnk
,color
,id
from @test
)
select a.id,a.color,b.id,b.color
from data a
join data b
on a.Color=b.Color
and b.rnk=a.rnk+1
where a.rnk%2=1
i get the output as follows
+----+-------+----+-------+
| id | color | id | color |
+----+-------+----+-------+
| 1 | Blue | 3 | Blue |
| 5 | Blue | 7 | Blue |
| 2 | Red | 6 | Red |
+----+-------+----+-------+
Upvotes: 2
Reputation: 222482
You could use row_number()
and conditional aggregation:
select
max(case when rn % 2 = 0 then id end) id1,
max(case when rn % 2 = 0 then color end) color1,
max(case when rn % 2 = 1 then id end) id2,
max(case when rn % 2 = 1 then color end) color2
from (
select
t.*,
row_number() over(partition by color order by id) - 1 rn
from @test t
) t
group by color, rn / 2
having count(*) = 2
The subquery ranks records having the same color
by increasing id
. Then, the outer query groups pairwise, and filters on groups that do contain two records.
id1 | color1 | id2 | color2 :-- | :----- | :-- | :----- 1 | Blue | 3 | Blue 2 | Red | 6 | Red
Upvotes: 1