Mikko Ohtamaa
Mikko Ohtamaa

Reputation: 83398

Optimising fetching multiple latest values for the last() and GROUP BY in TimescaleDB

I am having a query that fetches the latest price and a price 24h before multiple assets in TimescaleDb. Then my application proceeds to calculate % of the price change on the application side. Because there might not be any trades, the latest price for the time bucket might not be available, so I need to fill the gap "if not value in this time bucket, then get whatever is the previous value". Then the price change is stored in a normal PostgreSQL table, as a denormalised value, to be faster and more useful with normal PostgreSQL queries, with queries similar toJOIN latest_price.

The queries are performed on TimescaleDB continous aggregate views.

I am using last() and I am aware that fetching the latest value has been subject to discussion before. However, in my question, there is a special case of fetching multiple latest and close to latest values at once.

Currently, my method for price and 24h price delta calculation does it with two last() queries, clocking 100 seconds wall time in production. Whereas, a similar reference query that does not use last() but uses a naive GROUP BY does it in 1 second.

What would be the best approach to optimise a query that needs to fetch multiple latest values for a group of items from TimescaleDB, and then perform calculations on them? Would subqueries be faster? Whan an example subquery would look like?

Would creating a trigger that stores the latest value when candle_ohlcvx_aggregate_1h is updated make sense? Do triggers work with continuous aggregate views? How would this trigger look like?

My price calculation queries:


        -- Get the latest price
        SELECT
            pair_id,
            last(close, bucket) as last_close
        FROM candle_ohlcvx_aggregate_15m
        WHERE (bucket < :now_ts) AND (pair_id IN :pair_ids) 
        GROUP BY pair_id;

        -- Get the latest price 24h ago
        SELECT
            pair_id,
            last(close, bucket) as last_close
        FROM candle_ohlcvx_aggregate_15m
        WHERE (bucket < (:now_ts - INTERVAL '1 DAY')) AND (pair_id IN :pair_ids) 
        GROUP BY pair_id;

My reference query without last that clocks 1 second wall time - calculates the volume for one day using an hourly bucket, but does not gap-fill the last value using last() making it fast:

SELECT
            pair_id, SUM(buy_volume) + SUM(sell_volume) as vol
        FROM candle_ohlcvx_aggregate_1h
        WHERE (bucket >= :now_ts - INTERVAL '1 DAY') AND (bucket < :now_ts) AND (pair_id IN :pair_ids)
        GROUP BY pair_id
        ORDER BY pair_id ASC;
    """

Upvotes: 3

Views: 530

Answers (1)

davidk
davidk

Reputation: 1053

Both of those queries are going to scan the entire continuous aggregate. This query could be done on the base table (not the continuous aggregate) by using our distinct optimization: https://blog.timescale.com/blog/how-we-made-distinct-queries-up-to-8000x-faster-on-postgresql/

 SELECT DISTINCT ON (pair_id)
            *
            
        FROM base
        WHERE (ts < :now_ts) AND (pair_id IN :pair_ids) 
        ORDER BY pair_id, ts DESC ;

-- and 

 SELECT DISTINCT ON (pair_id)
            *
            
        FROM base
        WHERE (ts < :now_ts - INTERVAL '1 DAY') AND (pair_id IN :pair_ids) 
        ORDER BY pair_id, ts DESC ;

You'll need to have an index on (pair_id, ts) for this to be efficient. We are looking at ways to make this more efficient on continuous aggregates, but right now the distinct optimization doesn't really work on them because of the grouping they do under the hood.

Another option would be to do a lateral subquery from something like your pairs table (assuming one exists, you could also just unnest an array of pairs in a subquery or something):


SELECT p.pair_id, 
       c.close
FROM pairs p, LATERAL (SELECT close FROM  candle_ohlcvx_aggregate_15m ca WHERE ca.bucket < :now_ts AND ca.pair_id = p.pair_id ORDER BY ca.bucket DESC LIMIT 1) c 
        WHERE   (p.pair_id IN :pair_ids);

SELECT p.pair_id, 
       c.close
FROM pairs p, LATERAL (SELECT close FROM  candle_ohlcvx_aggregate_15m ca WHERE ca.bucket < :now_ts - INTERVAL '1 hour' AND ca.pair_id = p.pair_id ORDER BY ca.bucket DESC LIMIT 1) c 
        WHERE   (p.pair_id IN :pair_ids);

This will be slightly less efficient potentially but should at least avoid scanning the entire continuous aggregate I think.

Upvotes: 3

Related Questions