R Rommel
R Rommel

Reputation: 11

POSTGIS limiting a distance query by a threshold

I am looking to get pairwise distances between rows from two tables and only return the pairs that are within a threshold distance. The tables have large numbers of objects so performance is a concern.

I found an example of getting the closest distances at PostGIS minimum distance between two large sets of points

The code there looks like

SELECT 
    a.id, nn.id AS id_nn, 
    a.geom, nn.geom_closest, 
    ST_Distance_Sphere(a.geom, nn.geom_closest) AS min_dist
FROM 
    table_a AS a
    CROSS JOIN LATERAL
        (SELECT
            b.id, 
            b.geom AS geom_closest
        FROM table_b b
        ORDER BY a.geom <-> b.geom
        LIMIT 1) AS nn;

I'm terrible with SQL and I understand that the LIMIT 1 is taking the closest when they are ordered.

How do I modify this to give all pairs less than a threshold? I tried using a WHERE clause to limit it to within a value

SELECT 
    a.id, nn.id AS id_nn, 
    a.wkb_geometry, nn.geom_closest, 
    ST_DistanceSphere(a.wkb_geometry, nn.geom_closest) AS min_dist
FROM 
    mammography21 AS a
    CROSS JOIN LATERAL
        (SELECT
            b.gid as id, 
            b.wkb_geometry AS geom_closest
        FROM cartographic_boundary_us_zcta_2016 b
        ORDER BY a.wkb_geometry <-> b.wkb_geometry) AS nn
WHERE ST_DistanceSphere(a.wkb_geometry, nn.geom_closest) <= 10.0;

but this gives the error: The server encountered an internal error and was unable to complete your request. Either the server is overloaded or there is an error in the application. Even it had worked, I would guess it is an inefficient approach. How should I be trying to do this query?

Upvotes: 1

Views: 344

Answers (1)

JGH
JGH

Reputation: 17906

The ordering was used to get the closest point only. If you want more than one point, you don't really care about ordering the points by distance anymore.

Putting the distance validation is the right thing to do... but beware where you do it. Your query is taking too much time because for every point, it computes the distance to every other point (the lateral join) and then it filters the results to keep only the nearby ones.

Since you need more than just 1 point, the later join can be removed. As suggested by @thibautg, st_Dwithin is preferable as it makes use of the spatial index.

At last, you might want to cast your data to geography to compute the distance in meters (it depends on your data CRS). If it is the case, you would need a spatial index on the geographies too.

SELECT 
    a.id, nn.id AS id_nn, 
    a.geom srcGeom, nn.geom nearGeom, 
    ST_DistanceSphere(a.geom, nn.geom) AS near_dist
FROM 
    mammography21 AS a,
    cartographic_boundary_us_zcta_2016 nn
WHERE ST_DWithin(a.geom, nn.geom, 10);

Upvotes: 1

Related Questions