TreeWater
TreeWater

Reputation: 867

Should I fill gaps on insert or on select in timescaledb for a timeseries

Lets say I am tracking the price of items, and I get hourly data only if the price has changed.

I would have a table like

CREATE TABLE prices (
    item_id SERIAL PRIMARY KEY,
    price DECIMAL,
    timestamp TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

SELECT create_hypertable('prices', 'timestamp');

Every hour, I get a report consisting of (item_id, price) and I insert this in to the database.

Now I want to fetch the hourly report for the past 24 hours for and item, something like

SELECT item_id,
       time_bucket('1 hour', timestamp) AS hour,
       MAX(price) AS max_price
FROM prices
WHERE item_id = 1234
  AND timestamp >= NOW() - INTERVAL '24 hours'
GROUP BY item_id, hour
ORDER BY hour;

This gives something like

 item_id |        hour         | max_price 
---------+---------------------+-----------
    1234 | 2024-03-12 12:00:00 |     15.50
    1234 | 2024-03-13 02:00:00 |     19.50
    1234 | 2024-03-13 03:00:00 |     19.75
    1234 | 2024-03-13 10:00:00 |     21.50
    1234 | 2024-03-13 11:00:00 |     21.75

What I want is for the gaps between to be filled in with the previous price. The price might not change for days as well.

 item_id |        hour         | max_price 
---------+---------------------+-----------
    1234 | 2024-03-12 12:00:00 |     15.50
    1234 | 2024-03-12 13:00:00 |     15.50
    1234 | 2024-03-12 14:00:00 |     15.50
    1234 | 2024-03-12 15:00:00 |     15.50
    1234 | 2024-03-12 16:00:00 |     15.50
    1234 | 2024-03-12 17:00:00 |     15.50
    1234 | 2024-03-12 18:00:00 |     15.50
    1234 | 2024-03-12 19:00:00 |     15.50
    1234 | 2024-03-12 20:00:00 |     15.50
    1234 | 2024-03-12 21:00:00 |     15.50
    1234 | 2024-03-12 22:00:00 |     15.50
    1234 | 2024-03-12 23:00:00 |     15.50
    1234 | 2024-03-13 00:00:00 |     15.50
    1234 | 2024-03-13 01:00:00 |     15.50
    1234 | 2024-03-13 02:00:00 |     19.50
    1234 | 2024-03-13 03:00:00 |     19.75
    1234 | 2024-03-13 04:00:00 |     19.75
    1234 | 2024-03-13 05:00:00 |     19.75
    1234 | 2024-03-13 06:00:00 |     19.75
    1234 | 2024-03-13 07:00:00 |     19.75
    1234 | 2024-03-13 08:00:00 |     19.75
    1234 | 2024-03-13 09:00:00 |     19.75
    1234 | 2024-03-13 10:00:00 |     21.50
    1234 | 2024-03-13 11:00:00 |     21.75

Is it better to fill in this data on insert (not sure if there is a good way to do this or than querying for the last value of every item id and updating it) or do it in the select?

Is there a easy way to accomplish this in timescaledb?

Upvotes: 0

Views: 147

Answers (1)

byteSlayer
byteSlayer

Reputation: 2143

You can easily fill in the gaps when selecting the data using generate_series in your query. You can essentially generate a series of one row per hour for the timeframe you are looking for, and then join the records from your table onto that series:

with hours as (
  select generate_series(
    date_trunc('hour', now()) - INTERVAL '24 hours',
    date_trunc('hour', now()),
    '1 hour'::interval
  ) as hour
)

select hours.hour, MAX(prices.price) AS max_price, prices.item_id
from hours
left join prices on date_trunc('hour', prices.timestamp) = hours.hour
group by 1;

I'd steer away from filling in the gaps on insert for two reasons:

  1. If your organic data is pretty sparse (you don't update the prices very often) then you would just end up with a lot more rows than you have too that you are storing for no good reason.
  2. You'd be dependent on some process inserting the rows for the gaps. If that process fails, your data will become sparse again...

Upvotes: 0

Related Questions