Reputation: 20780
I have two (very large) tables of identical structure, holding two types of locations :
and
Each of them hold several million rows. I need to select all locations in LocA and for each location, the closest location in LocB.
What would be the most efficient query to do this?
EDIT1 : The distance algorithm would be a dumb one : SQRT(POWER(LocB.X - LocA.X, 2) + POWER(LocB.Y - LocA.Y, 2))
EDIT2 : An implementation that I've done but I'm really not sure if it's optimal (I highly doubt it), would be :
SELECT A.Id AS AId,
( SELECT TOP 1 B.Id
FROM B
ORDER BY SQRT(POWER(B.X - A.X, 2) + POWER(B.Y - A.Y, 2)) ASC
) AS BId
FROM A
EDIT3 : It's common to have "duplicates" in table LocB but I would want any of the matching "closest" to be returned for a location in LocA, not all.
Upvotes: 3
Views: 293
Reputation: 3437
Have you thought to take into consideration geography::Point, STDistance method, and create a spatial index on those points columns?
If your database structure is fixed, you can add a new persisted computed column.
Upvotes: 2
Reputation: 45096
The SQRT is not going to change the ORDER - it is just overhead
SELECT A.Id AS AId,
( SELECT TOP 1 B.Id
FROM B
ORDER BY POWER(B.X - A.X, 2) + POWER(B.Y - A.Y, 2) ASC
) AS BId
FROM A
I am thinking there is a way to perform two passes
You know the distance is <= delta X + delta Y
And the maximum error in that approximation is SQRT(2) - 1
This does not deal with duplicates or ties
I suspect the extra IO is not going to make up for the reduced number of POWER calculations but it might be worth a try
Only worth a try if you have #temp on SSD
create #temp1
IDa
IDb
Xa
Ya
Xb
Yb
distSum
distAct
insert into #temp (IDa, IDb, Xa, Ya, Xb ,Yb, distSum)
select a.ID, b.ID, a.x, a.y, b.x, b.y, abs(a.X-b.X) + abs(a.Y-b.Y)
table as a
join table as b
on a.ID < b.ID
delete #temp
from #temp
join
(select IDa, min(distSum) as minDistSum from #temp group by IDa) as aMin
on #temp.IDa = aMin.IDa
and #temp.distSum > 1.414*(minDistSum)
update #temp
set distAct = POWER(Xa - Xb, 2) + POWER(Ya - Yb, 2)
Upvotes: 1
Reputation: 24
How about :
SELECT id as Aid,x,y,m % 100 as bId
FROM (
SELECT A.id,A.x,A.y,MIN(CAST(((A.x-B.x)*(A.x-B.x)+(A.y-B.y)*(A.y-B.y)) AS BIGINT)*100+B.id) as m
FROM A
CROSS JOIN B
GROUP BY A.id,A.x,A.y) j;
Upvotes: 0
Reputation: 69759
This is not likely to be very efficient, but at the moment I can't see a better way:
SELECT a.ID, a.X, a.Y, b.ID, b.X, b.Y, b.Distance
FROM LocA a
CROSS APPLY
( SELECT TOP 1 WITH TIES
b.ID,
b.X,
b.Y,
Distance = SQRT(POWER(b.X - a.X, 2) + POWER(b.Y - a.Y, 2))
FROM LocB b
ORDER BY Distance
) B;
Upvotes: 2
Reputation: 3108
This is the code :
WITH S AS (
SELECT *
FROM LOCA CROSS APPLY( select locb.id as ID_B, (POWER(LocB.X - LocA.X, 2) + POWER(LocB.Y - LocA.Y, 2)) D FROM LOCB ) S
)
SELECT DISTINCT ID,X,Y,d,ID_B
FROM S
where d=(select min(d) from s s1 where s1.ID=s.id)
Upvotes: 0