Reputation: 49
I have a table of time-series data, with the columns:
sensor_number (integer primary key)
signal_strength (integer)
signal_time (timestamp)
Each sensor creates 20-30 rows per minute. I need a query that returns for a sensor 1 row per minute (or every 2 minutes, 3 minutes, etc). A pure SQL approach is to use a window function, with a partition on an expression that rounds the timestamp appropriately (date_trunc() works for the 1-minute case, otherwise I have to some messy casting) The problem is the expression blocks the ability to use the index. With 5B rows, that's a killer.
The best alternative I can come up with is a user-defined function that uses a cursor to step through the table in index key order (sensor_number, signal_time) and outputting a row every time the timestamp crosses a minute boundary. That's still slow though. Is there a pure SQL approach that'll accomplish this AND utilize the index?
Upvotes: 0
Views: 155
Reputation: 1952
I think if you're returning enough rows, scanning the whole range of rows that match the sensor_number will just be the best plan. The signal_time portion of the index may simply not be helpful at that point, because the database needs to read so many rows anyway.
However, if your time interval is big enough / the number of rows you're returning is small enough, it might be more efficient to hit the index separately for each row you're returning. Something like this (using an interval of 3 minutes and a sensor number of 5 as an example):
WITH range AS (
SELECT
max(signal_time) as max_time,
min(signal_time) as min_time
FROM timeseries
WHERE sensor_number = 5
)
SELECT sample.*
FROM range
JOIN generate_series(min_time, max_time, interval '3 minutes') timestamp ON true
JOIN LATERAL (
SELECT *
FROM timeseries
WHERE sensor_number = 5
AND signal_time >= timestamp
AND signal_time < timestamp + interval '3 minutes'
LIMIT 1
) sample ON true;
Upvotes: 1