Kyle Pennell
Kyle Pennell

Reputation: 6097

How to find only the distance to the closest linestring from a polygon in BigQuery?

I have some land parcel polygons and I have some powerline linestrings. For each parcel, I'm trying to find only the distance to the closest linestring.

This query runs but takes quite a long time:

SELECT 
  a.OBJECTID,
  st_distance(a.geometry, b.geometry) as distance

FROM `kpennell.KYLETEST.parcelpoly` a
join `kpennell.KYLETEST.powerlines` b
ON st_dwithin(a.geometry, b.geometry, 1000) 

I'm guessing I need to use Paul Ramsay Or Michael Entin's nearest neighbors things, but I can't seem to get the query right. The latter example finds just the id of the closest one but not the actual distance (I think).

SELECT 
  a.id, 
  ARRAY_AGG(b.id ORDER BY ST_Distance(a.geog, b.geog) LIMIT 1)
      [ORDINAL(1)] as neighbor_id
FROM people_table a JOIN restaurant_table b
ON ST_DWithin(a.geog, b.geom, 100) -- 100 is search radius
GROUP BY a.id

The Parcels are Polygons with a unique object id parcels

The Powerlines are linestrings with a unique object id enter image description here

What I'm hoping to find, for each parcel, is the parcel, and then the distance to the closest powerline. So for each parcel, I want to get the parcel OBJECTID and then distance to the closest powerline.

I'm trying to do this efficiently where it stops when it finds the nearest neighbor for each one.

Upvotes: 0

Views: 767

Answers (1)

John Moutafis
John Moutafis

Reputation: 23134

We can combine the distance with the id using a STRUCT:

SELECT 
  poly.id, 
  ARRAY_AGG(STRUCT(pline.id, ST_Distance(poly.geog, pline.geog)) 
    ORDER BY ST_Distance(poly.geog, pline.geog) LIMIT 1
  ) [ORDINAL(1)] as neighbor
FROM 
    `kpennell.KYLETEST.parcelpoly` poly JOIN 
    `kpennell.KYLETEST.powerlines` pline ON 
     ST_DWithin(poly.geog, pline.geom, 100) -- 100 is search radius
GROUP BY poly.id

There are other options though:

If we know the parcelpoly id before the query (from an input form etc.) we can have a simplified query:

SELECT 
  poly.id,
  pline.id,
  st_distance(poly.geometry, pline.geometry) as distance
FROM `kpennell.KYLETEST.parcelpoly` AS poly,
     `kpennell.KYLETEST.powerlines` AS pline
WHERE poly.id = <known_parcelpoly_id> AND st_dwithin(poly.geometry, pline.geometry, 1000)
ORDER BY st_distance(poly.geometry, pline.geometry) 
LIMIT 1

Finally, we can use PostGIS's knn operator (<->):

SELECT 
  poly.id,
  pline.id,
  st_distance(poly.geometry, pline.geometry) as distance
FROM `kpennell.KYLETEST.parcelpoly` AS poly,
     `kpennell.KYLETEST.powerlines` AS pline
WHERE poly.id = <known_parcelpoly_id> AND st_dwithin(poly.geometry, pline.geometry, 1000)
ORDER BY poly.geometry <-> pline.geometry
LIMIT 1

There is a caveat though here!
The <-> operator:

For PostgreSQL below 9.5 only gives centroid distance of bounding boxes and for PostgreSQL 9.5+, does true KNN distance search giving true distance between geometries, and distance sphere for geographies.

which means that for PostgreSQL < 9.5 we may lose accuracy with that method.

Upvotes: 2

Related Questions