Reputation: 867
Lets say I am tracking the price of items, and I get hourly data only if the price has changed.
I would have a table like
CREATE TABLE prices (
item_id SERIAL PRIMARY KEY,
price DECIMAL,
timestamp TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
SELECT create_hypertable('prices', 'timestamp');
Every hour, I get a report consisting of (item_id, price) and I insert this in to the database.
Now I want to fetch the hourly report for the past 24 hours for and item, something like
SELECT item_id,
time_bucket('1 hour', timestamp) AS hour,
MAX(price) AS max_price
FROM prices
WHERE item_id = 1234
AND timestamp >= NOW() - INTERVAL '24 hours'
GROUP BY item_id, hour
ORDER BY hour;
This gives something like
item_id | hour | max_price
---------+---------------------+-----------
1234 | 2024-03-12 12:00:00 | 15.50
1234 | 2024-03-13 02:00:00 | 19.50
1234 | 2024-03-13 03:00:00 | 19.75
1234 | 2024-03-13 10:00:00 | 21.50
1234 | 2024-03-13 11:00:00 | 21.75
What I want is for the gaps between to be filled in with the previous price. The price might not change for days as well.
item_id | hour | max_price
---------+---------------------+-----------
1234 | 2024-03-12 12:00:00 | 15.50
1234 | 2024-03-12 13:00:00 | 15.50
1234 | 2024-03-12 14:00:00 | 15.50
1234 | 2024-03-12 15:00:00 | 15.50
1234 | 2024-03-12 16:00:00 | 15.50
1234 | 2024-03-12 17:00:00 | 15.50
1234 | 2024-03-12 18:00:00 | 15.50
1234 | 2024-03-12 19:00:00 | 15.50
1234 | 2024-03-12 20:00:00 | 15.50
1234 | 2024-03-12 21:00:00 | 15.50
1234 | 2024-03-12 22:00:00 | 15.50
1234 | 2024-03-12 23:00:00 | 15.50
1234 | 2024-03-13 00:00:00 | 15.50
1234 | 2024-03-13 01:00:00 | 15.50
1234 | 2024-03-13 02:00:00 | 19.50
1234 | 2024-03-13 03:00:00 | 19.75
1234 | 2024-03-13 04:00:00 | 19.75
1234 | 2024-03-13 05:00:00 | 19.75
1234 | 2024-03-13 06:00:00 | 19.75
1234 | 2024-03-13 07:00:00 | 19.75
1234 | 2024-03-13 08:00:00 | 19.75
1234 | 2024-03-13 09:00:00 | 19.75
1234 | 2024-03-13 10:00:00 | 21.50
1234 | 2024-03-13 11:00:00 | 21.75
Is it better to fill in this data on insert (not sure if there is a good way to do this or than querying for the last value of every item id and updating it) or do it in the select?
Is there a easy way to accomplish this in timescaledb?
Upvotes: 0
Views: 147
Reputation: 2143
You can easily fill in the gaps when selecting the data using generate_series
in your query. You can essentially generate a series of one row per hour for the timeframe you are looking for, and then join the records from your table onto that series:
with hours as (
select generate_series(
date_trunc('hour', now()) - INTERVAL '24 hours',
date_trunc('hour', now()),
'1 hour'::interval
) as hour
)
select hours.hour, MAX(prices.price) AS max_price, prices.item_id
from hours
left join prices on date_trunc('hour', prices.timestamp) = hours.hour
group by 1;
I'd steer away from filling in the gaps on insert for two reasons:
Upvotes: 0