Alex Crooks
Alex Crooks

Reputation: 97

Finding the closest passing points in a series of GPS data

I have a table with the following schema:

                Table "public.gps_log"
          Column           |            Type             
---------------------------+-----------------------------
 id                        | integer                     
 logged_at                 | timestamp without time zone 
 lonlat                    | geography(Point,4326)       

And another with the following schema:

               Table "public.waypoint"
          Column           |            Type             
---------------------------+-----------------------------
 id                        | integer                     
 lonlat                    | geography(Point,4326)       
 range                     | numeric

gps_log contains a per-minute position report from a GPS and waypoint contains a set of waypoints with their positions.

I have the following query:

SELECT gps_log.id as id1, waypoint.id as id2, ST_Distance(gps_log.lonlat, waypoint.lonlat) as dist
  FROM gps_log, waypoint
  WHERE ST_DWithin(gps_log.lonlat, waypoint.lonlat, waypoint.range)
  ORDER BY id1;

That returns the following result:

  id1  | id2  |      dist      
-------+------+----------------
  4499 | 1118 | 2580.557160943  <- first and closest dist in this cluster of id2=1118
  4500 | 1118 | 2580.557160943
  4501 | 1119 | 3861.038787463  <- etc. ...
  4502 | 1119 | 3861.038787463
  4503 | 1118 |  885.481236082  <-
  4504 | 1118 |  885.481236082
  4505 | 1119 | 2114.289192152  <-
  4506 | 1119 | 2114.289192152
  4507 | 1118 | 3209.147139384
  4508 | 1118 | 3209.147139384
  4510 | 1118 | 2194.494307877  <-
  4511 | 1118 | 2194.494307877
  4514 | 1118 | 2253.163728865
  4515 | 1117 | 3155.907772254  <-
  4518 | 1117 | 3875.930499045
  4519 | 1118 | 3834.344459575  <-
  4522 | 1117 | 2025.333877603  <-
  4523 | 1117 | 2025.333877603
  4603 | 1116 | 1075.801799628
  4604 | 1116 | 1075.801799628
  4607 | 1116 |  663.907042351
  4608 | 1116 |  663.907042351
  4611 | 1116 |  319.142003353  <-
  4612 | 1116 |  319.142003353
  4614 | 1116 |   535.03813233
  4615 | 1116 |   535.03813233
  4630 | 1132 | 2134.348193208
  4631 | 1181 |  2165.19731156  <-
  4784 | 1134 |  337.398349813  <-
  4785 | 1134 |  337.398349813
  4788 | 1135 | 1388.859874755  <-
  4789 | 1135 | 1388.859874755

I would ultimately like the query to return the GPS position report that most closely passes each waypoint, and a waypoint can be passed multiple times.

The arrows above represent the records I would like to return as a result of the query - basically each time there is a new cluster of waypoint IDs, the position report with the nearest distance to the waypoint is returned.

Right now, my solution is running the above query, and then processing the results in code to loop through the array and find the correct values.

Upvotes: 1

Views: 73

Answers (1)

Gordon Linoff
Gordon Linoff

Reputation: 1270713

This is a variant of the "gaps-and-islands" problem. You can identify the groups using a different of row numbers. You can then extract the row with the minimum distance using DISTINCT ON:

WITH d as (
      SELECT gps_log.id as id1, w.id as id2, ST_Distance(gl.lonlat, w.lonlat) as dist
      FROM gps_log gl JOIN
           waypoint w
           ON ST_DWithin(ol.lonlat, w.lonlat, w.range)
     )
SELECT DISTINCT ON (id2, grp) d.*
FROM (SELECT d.*,
             (ROW_NUMBER() OVER (ORDER BY id1) - 
              ROW_NUMBER() OVER (PARTITION BY id2 ORDER BY id1)
             ) as grp
      FROM d
     ) d
ORDER BY id2, grp, dist;

Understanding why the difference of row numbers works is rather tricky. I would advise you to run the subquery producing the two row number values. You can then see how the difference defines the groups that you want.

Upvotes: 1

Related Questions