jasonmclose
jasonmclose

Reputation: 1695

SQL Query where I get most recent rows from timestamp from another table

I've got some sensory information going into a table. I have figured out the query that will tell me exactly when the value at a particular device changes.

What I need to know is the status of all of the other sensors at that time. The trick is, the timestamps won't be equal. I could get a data point from sensor 1, then 3 minute later, one from sensor 2, and then 30 seconds later, another from sensor 1.

So, here is an example of what I am talking about:

--- data_table ---

sensor | state | stime
-------+-------+---------------------
     1 |     A | 2014-08-17 21:42:00
     1 |     A | 2014-08-17 21:43:00
     2 |     B | 2014-08-17 21:44:00
     3 |     C | 2014-08-17 21:45:00
     2 |     D | 2014-08-17 21:46:00
     3 |     C | 2014-08-17 21:47:00
     1 |     B | 2014-08-17 21:48:00
     3 |     A | 2014-08-17 21:49:00
     2 |     D | 2014-08-17 21:50:00
     2 |     A | 2014-08-17 21:51:00

Now, I know the query that will deliver me the state changes. I've got this down, and it's in a view. That table would look like:

 --- state_changed_view ---

sensor | state | stime
-------+-------+---------------------
     2 |     D | 2014-08-17 21:46:00
     1 |     B | 2014-08-17 21:48:00
     3 |     A | 2014-08-17 21:49:00
     2 |     A | 2014-08-17 21:51:00 

What I want is a JOIN, where I can get all of the values of the 'state_changed_view', but also the values of the other corresponding sensors at the 'sensor_timestamp' within the view.

So, ideally, I want my result to look like (or something similar to):

sensor | state | stime               | sensor | state | stime
-------+-------+---------------------+--------+-------+---------------------
     2 |     D | 2014-08-17 21:46:00 |      1 |     A | 2014-08-17 21:43:00
     2 |     D | 2014-08-17 21:46:00 |      2 |     D | 2014-08-17 21:46:00
     2 |     D | 2014-08-17 21:46:00 |      3 |     C | 2014-08-17 21:45:00
     1 |     B | 2014-08-17 21:48:00 |      1 |     B | 2014-08-17 21:48:00
     1 |     B | 2014-08-17 21:48:00 |      2 |     D | 2014-08-17 21:46:00
     1 |     B | 2014-08-17 21:48:00 |      3 |     C | 2014-08-17 21:47:00
     3 |     A | 2014-08-17 21:49:00 |      1 |     B | 2014-08-17 21:48:00
     3 |     A | 2014-08-17 21:49:00 |      2 |     D | 2014-08-17 21:46:00 
     3 |     A | 2014-08-17 21:49:00 |      3 |     A | 2014-08-17 21:49:00 
     2 |     A | 2014-08-17 21:51:00 |      1 |     B | 2014-08-17 21:48:00 
     2 |     A | 2014-08-17 21:51:00 |      2 |     A | 2014-08-17 21:51:00 
     2 |     A | 2014-08-17 21:51:00 |      3 |     A | 2014-08-17 21:49:00

As you can see, I need the most recent row in 'data_table' for each sensor, for every row that exists in state_changed_view.

I just don't know how to get the SQL to get me the most recent row according to a particular timestamp.

This is on a PL/pgSQL system, so anything compatible with Postgres is handy.

Upvotes: 1

Views: 577

Answers (3)

Erwin Brandstetter
Erwin Brandstetter

Reputation: 656706

Query

For a small, given set of sensors to retrieve (this works for Postgres 8.4 or later):

SELECT c.sensor AS sensor_change
     , d1.state AS state_1, d1.stime AS stime_1
     , d2.state AS state_2, d2.stime AS stime_2
     , d3.state AS state_3, d3.stime AS stime_3
FROM  (
   SELECT sensor, stime
        , lag(state) OVER (PARTITION BY sensor ORDER BY stime)
           <> state AS change
        , max(CASE WHEN sensor = 1 THEN stime ELSE NULL END) OVER w AS last_1
        , max(CASE WHEN sensor = 2 THEN stime ELSE NULL END) OVER w AS last_2
        , max(CASE WHEN sensor = 3 THEN stime ELSE NULL END) OVER w AS last_3
   FROM   data d
   WINDOW w AS (ORDER BY stime)
   ) c
JOIN   data d1 ON d1.sensor = 1 AND d1.stime = c.last_1
JOIN   data d2 ON d2.sensor = 2 AND d2.stime = c.last_2
JOIN   data d3 ON d3.sensor = 3 AND d3.stime = c.last_3
WHERE  c.change
ORDER  BY c.stime;

Not using the view at all, building on the table directly, that's faster.

This is assuming a UNIQUE INDEX on (sensor, stime) to be unambiguous. Performance also heavily depends on such an index.

As opposed to @Nick's solution, building on JOIN LATERAL (Postgres 9.3+), this returns a single row with all values for every change in state.

PL/pgSQL function

Since you mentioned PL/pgSQL, I would expect this (highly optimized) plpgsql function to perform better, since it can make do with a single sequential scan of the table:

CREATE OR REPLACE FUNCTION f_sensor_change()
  RETURNS TABLE (sensor_change int   -- adapt to actual data types!
               , state_1 "char", stime_1 timestamp
               , state_2 "char", stime_2 timestamp
               , state_3 "char", stime_3 timestamp) AS
$func$
DECLARE
   r    data%rowtype;
BEGIN

FOR r IN
   TABLE data ORDER BY stime
LOOP
   CASE r.sensor
   WHEN 1 THEN  
      IF    r.state =  state_1 THEN  -- just save stime
         stime_1 := r.stime;
      ELSIF r.state <> state_1 THEN  -- save all & RETURN
         stime_1 := r.stime; state_1 := r.state;
         sensor_change := 1; RETURN NEXT;
      ELSE                           -- still NULL: init
         stime_1 := r.stime; state_1 := r.state;
      END IF;

   WHEN 2 THEN
      IF    r.state =  state_2 THEN
         stime_2 := r.stime;
      ELSIF r.state <> state_2 THEN
         stime_2 := r.stime; state_2 := r.state;
         sensor_change := 2; RETURN NEXT;
      ELSE
         stime_2 := r.stime; state_2 := r.state;
      END IF;

   WHEN 3 THEN
      IF    r.state =  state_3 THEN
         stime_3 := r.stime;
      ELSIF r.state <> state_3 THEN
         stime_3 := r.stime; state_3 := r.state;
         sensor_change := 3; RETURN NEXT;
      ELSE
         stime_3 := r.stime; state_3 := r.state;
      END IF;
   ELSE             -- do nothing, ignore other sensors
   END CASE;
END LOOP;

END
$func$ LANGUAGE plpgsql;

Call:

SELECT * FROM f_sensor_change();

Makes sense for repeated use. Related answer:

SQL Fiddle for Postgres 9.3.
SQL Fiddle for Postgres 8.4.

Upvotes: 3

Nick Barnes
Nick Barnes

Reputation: 21336

There are a couple of things making this not-so-straightforward:

  • You want to do a subquery for each state_changed_view row, but the subquery must mention the corresponding stime from the view (to restrict it to earlier records). Ordinary subqueries aren't allowed to depend on external fields, but you can accomplish this (as of Postgres 9.3, at least) with a lateral join.
  • You need not only MAX(data_table.stime), but the corresponding data_table.state. You could do this with another nested query to retrieve the rest of the row, but SELECT DISTINCT ON gives you an easy way to fetch the whole thing at once.

The end result is something like this:

SELECT *
FROM
  state_changed_view,
  LATERAL (
    SELECT DISTINCT ON (sensor)
      sensor,
      state,
      stime
    FROM
      data_table
    WHERE
      data_table.stime <= state_changed_view.stime
    ORDER BY
      sensor,
      stime DESC
  ) a

Upvotes: 1

Hedinn
Hedinn

Reputation: 864

Start by finding the max time for each sensor and state with a subquery which groups on sensor and state and then join that to the view

SELECT *
FROM 
(SELECT sensor, state, MAX(stime) as stime
from data_table
group by sensor, state) a
join state_changed_view on 1=1

Upvotes: 0

Related Questions