Find Next Closest Number in PostgreSQL

Question

I am running PostgreSQL 9.1.9 x64 with PostGIS 2.0.3 under Windows Server 2008 R2.

I have a table:

CREATE TABLE field_data.trench_samples (
   pgid SERIAL NOT NULL,
   trench_id TEXT,
   sample_id TEXT,
   from_m INTEGER
);

With some data in it:

INSERT INTO field_data.trench_samples (
   trench_id, sample_id, from_m
)
VALUES
   ('TR01', '1000001', 0),
   ('TR01', '1000002', 5),
   ('TR01', '1000003', 10),
   ('TR01', '1000004', 15),
   ('TR02', '1000005', 0),
   ('TR02', '1000006', 3),
   ('TR02', '1000007', 9),
   ('TR02', '1000008', 14);

Now, what I am interested in is finding the difference (distance in metres in this example) between a record's "from_m" and the "next" "from_m" for that trench_id.

So, based on the data above, I'd like to end up with a query that produces the following table:

pgid, trench_id, sample_id, from_m, to_m, interval
1, 'TR01', '1000001', 0, 5, 5
2, 'TR01', '1000002', 5, 10, 5
3, 'TR01', '1000003', 10, 15, 5
4, 'TR01', '1000004', 15, 20, 5
5, 'TR02', '1000005', 0, 3, 3
6, 'TR02', '1000006', 3, 9, 6
7, 'TR02', '1000007', 9, 14, 5
8, 'TR02', '1000008', 14, 19, 5

Now, you are likely saying "wait, how do we infer an interval length for the last sample in each line, since there is no "next" from_m to compare to?"

For the "ends" of lines (sample_id 1000004 and 1000008) I would like to use the identical interval length of the previous two samples.

Of course, I have no idea how to tackle this in my current environment. Your help is very much appreciated.

Gordon Linoff · Accepted Answer

Here is how you get the difference, using the one previous example at the end (as shown in the data but not explained clearly in the text).

The logic here is repeated application of lead() and lag(). First apply lead() to calculate the interval. Then apply lag() to calculate the interval at the boundary, by using the previous interval.

The rest is basically just arithmetic:

select trench_id, sample_id, from_m,
       coalesce(to_m,
                from_m + lag(interval) over (partition by trench_id order by sample_id)
               ) as to_m,
       coalesce(interval, lag(interval) over (partition by trench_id order by sample_id))
from (select t.*,
             lead(from_m) over (partition by trench_id order by sample_id) as to_m,
             (lead(from_m) over (partition by trench_id order by sample_id) -
              from_m
             ) as interval
      from field_data.trench_samples t
     ) t

Here is the SQLFiddle showing it working.

Find Next Closest Number in PostgreSQL

Answers (1)

Related Questions