franklynd
franklynd

Reputation: 1940

How can I select rows with most recent timestamp for each key value?

I have a table of sensor data. Each row has a sensor id, a timestamp, and other fields. I want to select a single row with latest timestamp for each sensor, including some of the other fields.

I thought that the solution would be to group by sensor id and then order by max(timestamp) like so:

SELECT sensorID,timestamp,sensorField1,sensorField2 
FROM sensorTable 
GROUP BY sensorID 
ORDER BY max(timestamp);

This gives me an error saying that "sensorField1 must appear in the group by clause or be used in an aggregate."

What is the correct way to approach this problem?

Upvotes: 149

Views: 415032

Answers (9)

Internetbug256
Internetbug256

Reputation: 143

I know this is an ancient post, but in my case I was looking for a solution that includes performance in the equation, since my file has millions of rows.

I came up with creating a temporary table on the fly with the top values searched, then joined this table with the original one. Difference in speed is huge:

CREATE TEMPORARY TABLE sensorTable_temp AS
(SELECT sensorID, MAX(timestamp) as max_t FROM sensorTable GROUP BY 1);

SELECT a.sensorID, a.timestamp, sensorFiled1, sensorFiled2
FROM sensorTable a, sensorTable_temp b
WHERE a.sensorID = b.sensorID AND a.timestamp=b.max_t;

The temporary table lives for the session only, so no need to wipe it out after completing the next sentence.

Of course an index in timestamp column helps a lot too (but not enough in my case)

Upvotes: 1

Svet
Svet

Reputation: 1730

In Postgres this can de done in a relatively elegant way using SELECT DISTINCT, as follows:

SELECT DISTINCT ON (sensorID)
sensorID, timestamp, sensorField1, sensorField2 
FROM sensorTable
ORDER BY sensorID, timestamp DESC;

Some more info here. I suspect it also works for other SQL flavors, though apparently not MySQL (link - thanks for the tip @silentsurfer)

In case it's not obvious, what this does is sort the table by sensor ID and timestamp (newest to oldest), and then returns the first row (i.e. latest timestamp) for each unique sensor ID.

In my use case I have ~10M readings from ~1K sensors, so trying to join the table with itself on a timestamp-based filter is very resource-intensive; the above takes a couple of seconds.

Upvotes: 104

eci
eci

Reputation: 2412

Also wanted to give the answer using the not exists clause:

SELECT sensorID,timestamp,sensorField1,sensorField2 
FROM sensorTable t1
where not exists
( select * from sensorTable t2 where t1.sensorId=t2.sensorId
  and t1.timestamp < t2.timestamp );

which, depending on your DBMS/SQL optimizer, might be an efficient and good choice.

Upvotes: 0

Joel Coehoorn
Joel Coehoorn

Reputation: 415600

WITH SensorTimes As (
   SELECT sensorID, MAX(timestamp) "LastReading"
   FROM sensorTable
   GROUP BY sensorID
)
SELECT s.sensorID,s.timestamp,s.sensorField1,s.sensorField2 
FROM sensorTable s
INNER JOIN SensorTimes t on s.sensorID = t.sensorID and s.timestamp = t.LastReading

Eight years later and this just got upvoted, so I need to point out this is the old way to do it. The new way uses the row_number() windowing function or an APPLY lateral join.

Upvotes: 8

Jamie Marshall
Jamie Marshall

Reputation: 2294

There is one common answer I haven't see here yet, which is the Window Function. It is an alternative to the correlated sub-query, if your DB supports it.

SELECT sensorID,timestamp,sensorField1,sensorField2 
FROM (
    SELECT sensorID,timestamp,sensorField1,sensorField2
        , ROW_NUMBER() OVER(
            PARTITION BY sensorID
            ORDER BY timestamp
        ) AS rn
    FROM sensorTable s1
WHERE rn = 1
ORDER BY sensorID, timestamp;

I acually use this more than correlated sub-queries. Feel free to bust me in the comments over effeciancy, I'm not too sure how it stacks up in that regard.

Upvotes: 10

fancyPants
fancyPants

Reputation: 51868

For the sake of completeness, here's another possible solution:

SELECT sensorID,timestamp,sensorField1,sensorField2 
FROM sensorTable s1
WHERE timestamp = (SELECT MAX(timestamp) FROM sensorTable s2 WHERE s1.sensorID = s2.sensorID)
ORDER BY sensorID, timestamp;

Pretty self-explaining I think, but here's more info if you wish, as well as other examples. It's from the MySQL manual, but above query works with every RDBMS (implementing the sql'92 standard).

Upvotes: 133

Hucker
Hucker

Reputation: 681

I had mostly the same problem and ended up a a different solution that makes this type of problem trivial to query.

I have a table of sensor data (1 minute data from about 30 sensors)

SensorReadings->(timestamp,value,idSensor)

and I have a sensor table that has lots of mostly static stuff about the sensor but the relevant fields are these:

Sensors->(idSensor,Description,tvLastUpdate,tvLastValue,...)

The tvLastupdate and tvLastValue are set in a trigger on inserts to the SensorReadings table. I always have direct access to these values without needing to do any expensive queries. This does denormalize slightly. The query is trivial:

SELECT idSensor,Description,tvLastUpdate,tvLastValue 
FROM Sensors

I use this method for data that is queried often. In my case I have a sensor table, and a large event table, that have data coming in at the minute level AND dozens of machines are updating dashboards and graphs with that data. With my data scenario the trigger-and-cache method works well.

Upvotes: 0

dognose
dognose

Reputation: 20889

You can join the table with itself (on sensor id), and add left.timestamp < right.timestamp as join condition. Then you pick the rows, where right.id is null. Voila, you got the latest entry per sensor.

http://sqlfiddle.com/#!9/45147/37

SELECT L.* FROM sensorTable L
LEFT JOIN sensorTable R ON
L.sensorID = R.sensorID AND
L.timestamp < R.timestamp
WHERE isnull (R.sensorID)

But please note, that this will be very resource intensive if you have a little amount of ids and many values! So, I wouldn't recommend this for some sort of Measuring-Stuff, where each Sensor collects a value every minute. However in a Use-Case, where you need to track "Revisions" of something that changes just "sometimes", it's easy going.

Upvotes: 24

juergen d
juergen d

Reputation: 204746

You can only select columns that are in the group or used in an aggregate function. You can use a join to get this working

select s1.* 
from sensorTable s1
inner join 
(
  SELECT sensorID, max(timestamp) as mts
  FROM sensorTable 
  GROUP BY sensorID 
) s2 on s2.sensorID = s1.sensorID and s1.timestamp = s2.mts

Upvotes: 32

Related Questions