Reputation: 11
I am looking to get pairwise distances between rows from two tables and only return the pairs that are within a threshold distance. The tables have large numbers of objects so performance is a concern.
I found an example of getting the closest distances at PostGIS minimum distance between two large sets of points
The code there looks like
SELECT
a.id, nn.id AS id_nn,
a.geom, nn.geom_closest,
ST_Distance_Sphere(a.geom, nn.geom_closest) AS min_dist
FROM
table_a AS a
CROSS JOIN LATERAL
(SELECT
b.id,
b.geom AS geom_closest
FROM table_b b
ORDER BY a.geom <-> b.geom
LIMIT 1) AS nn;
I'm terrible with SQL and I understand that the LIMIT 1 is taking the closest when they are ordered.
How do I modify this to give all pairs less than a threshold? I tried using a WHERE clause to limit it to within a value
SELECT
a.id, nn.id AS id_nn,
a.wkb_geometry, nn.geom_closest,
ST_DistanceSphere(a.wkb_geometry, nn.geom_closest) AS min_dist
FROM
mammography21 AS a
CROSS JOIN LATERAL
(SELECT
b.gid as id,
b.wkb_geometry AS geom_closest
FROM cartographic_boundary_us_zcta_2016 b
ORDER BY a.wkb_geometry <-> b.wkb_geometry) AS nn
WHERE ST_DistanceSphere(a.wkb_geometry, nn.geom_closest) <= 10.0;
but this gives the error: The server encountered an internal error and was unable to complete your request. Either the server is overloaded or there is an error in the application. Even it had worked, I would guess it is an inefficient approach. How should I be trying to do this query?
Upvotes: 1
Views: 344
Reputation: 17906
The ordering was used to get the closest point only. If you want more than one point, you don't really care about ordering the points by distance anymore.
Putting the distance validation is the right thing to do... but beware where you do it. Your query is taking too much time because for every point, it computes the distance to every other point (the lateral join) and then it filters the results to keep only the nearby ones.
Since you need more than just 1 point, the later join can be removed. As suggested by @thibautg, st_Dwithin
is preferable as it makes use of the spatial index.
At last, you might want to cast your data to geography to compute the distance in meters (it depends on your data CRS). If it is the case, you would need a spatial index on the geographies too.
SELECT
a.id, nn.id AS id_nn,
a.geom srcGeom, nn.geom nearGeom,
ST_DistanceSphere(a.geom, nn.geom) AS near_dist
FROM
mammography21 AS a,
cartographic_boundary_us_zcta_2016 nn
WHERE ST_DWithin(a.geom, nn.geom, 10);
Upvotes: 1