Reputation: 83398
I am having a query that fetches the latest price and a price 24h before multiple assets in TimescaleDb. Then my application proceeds to calculate % of the price change on the application side. Because there might not be any trades, the latest price for the time bucket might not be available, so I need to fill the gap "if not value in this time bucket, then get whatever is the previous value". Then the price change is stored in a normal PostgreSQL table, as a denormalised value, to be faster and more useful with normal PostgreSQL queries, with queries similar toJOIN latest_price
.
The queries are performed on TimescaleDB continous aggregate views.
I am using last()
and I am aware that fetching the latest value has been subject to discussion before. However, in my question, there is a special case of fetching multiple latest and close to latest values at once.
Currently, my method for price and 24h price delta calculation does it with two last()
queries, clocking 100 seconds wall time in production. Whereas, a similar reference query that does not use last()
but uses a naive GROUP BY
does it in 1 second.
What would be the best approach to optimise a query that needs to fetch multiple latest values for a group of items from TimescaleDB, and then perform calculations on them? Would subqueries be faster? Whan an example subquery would look like?
Would creating a trigger that stores the latest value when candle_ohlcvx_aggregate_1h
is updated make sense? Do triggers work with continuous aggregate views? How would this trigger look like?
My price calculation queries:
-- Get the latest price
SELECT
pair_id,
last(close, bucket) as last_close
FROM candle_ohlcvx_aggregate_15m
WHERE (bucket < :now_ts) AND (pair_id IN :pair_ids)
GROUP BY pair_id;
-- Get the latest price 24h ago
SELECT
pair_id,
last(close, bucket) as last_close
FROM candle_ohlcvx_aggregate_15m
WHERE (bucket < (:now_ts - INTERVAL '1 DAY')) AND (pair_id IN :pair_ids)
GROUP BY pair_id;
My reference query without last that clocks 1 second wall time - calculates the volume for one day using an hourly bucket, but does not gap-fill the last value using last()
making it fast:
SELECT
pair_id, SUM(buy_volume) + SUM(sell_volume) as vol
FROM candle_ohlcvx_aggregate_1h
WHERE (bucket >= :now_ts - INTERVAL '1 DAY') AND (bucket < :now_ts) AND (pair_id IN :pair_ids)
GROUP BY pair_id
ORDER BY pair_id ASC;
"""
Upvotes: 3
Views: 530
Reputation: 1053
Both of those queries are going to scan the entire continuous aggregate. This query could be done on the base table (not the continuous aggregate) by using our distinct optimization: https://blog.timescale.com/blog/how-we-made-distinct-queries-up-to-8000x-faster-on-postgresql/
SELECT DISTINCT ON (pair_id)
*
FROM base
WHERE (ts < :now_ts) AND (pair_id IN :pair_ids)
ORDER BY pair_id, ts DESC ;
-- and
SELECT DISTINCT ON (pair_id)
*
FROM base
WHERE (ts < :now_ts - INTERVAL '1 DAY') AND (pair_id IN :pair_ids)
ORDER BY pair_id, ts DESC ;
You'll need to have an index on (pair_id, ts)
for this to be efficient. We are looking at ways to make this more efficient on continuous aggregates, but right now the distinct optimization doesn't really work on them because of the grouping they do under the hood.
Another option would be to do a lateral subquery from something like your pairs table (assuming one exists, you could also just unnest an array of pairs in a subquery or something):
SELECT p.pair_id,
c.close
FROM pairs p, LATERAL (SELECT close FROM candle_ohlcvx_aggregate_15m ca WHERE ca.bucket < :now_ts AND ca.pair_id = p.pair_id ORDER BY ca.bucket DESC LIMIT 1) c
WHERE (p.pair_id IN :pair_ids);
SELECT p.pair_id,
c.close
FROM pairs p, LATERAL (SELECT close FROM candle_ohlcvx_aggregate_15m ca WHERE ca.bucket < :now_ts - INTERVAL '1 hour' AND ca.pair_id = p.pair_id ORDER BY ca.bucket DESC LIMIT 1) c
WHERE (p.pair_id IN :pair_ids);
This will be slightly less efficient potentially but should at least avoid scanning the entire continuous aggregate I think.
Upvotes: 3