Reputation: 4032
I have a QuestDB table storing market data, and I need to find the first recorded trade for each stock symbol in the past month.
This is my schema
CREATE TABLE 'trades' (
symbol SYMBOL CAPACITY 256 CACHE,
side SYMBOL CAPACITY 256 CACHE,
price DOUBLE,
amount DOUBLE,
timestamp TIMESTAMP
) timestamp(timestamp) PARTITION BY DAY WAL;
Sample Data
symbol | side | price | amount | timestamp |
---|---|---|---|---|
BTC-USDT | buy | 84717.7 | 0.00177074 | 2025-02-28T00:00:00.114000Z |
BTC-USDT | buy | 84718.2 | 0.00235000 | 2025-02-28T00:00:05.210000Z |
ETH-USDT | buy | 2307.77 | 0.064997 | 2025-02-28T00:00:00.520999Z |
ETH-USDT | buy | 2308.50 | 0.032100 | 2025-02-28T00:00:10.620000Z |
DOGE-USDT | buy | 0.20691 | 99.934973 | 2025-02-28T00:00:00.542000Z |
DOGE-USDT | buy | 0.20694 | 50.000000 | 2025-02-28T00:00:02.847000Z |
SOL-USDT | buy | 137.70 | 0.014524 | 2025-02-28T00:00:00.604000Z |
SOL-USDT | buy | 138.20 | 0.500000 | 2025-02-28T00:00:07.340000Z |
I can get the first trade per symbol with min
MIN(timestamp)
, like this:
SELECT symbol, min(timestamp) FROM trades
WHERE timestamp IN today();
But this returns just the first timestamp per symbol, and I need the whole row. If I wanted the last row, I could do:
SELECT * from trades latest on timestamp partition by symbol;
But I want the reverse of tat. I tried adding LATEST ON timestamp DESC
but then it complains.
Is there an equivalent to LATEST ON
for EARLIEST ON
? Otherwise, How can I modify this query to get the first row of the day for each different symbol?
Upvotes: 0
Views: 10
Reputation: 4032
This can be solved in at least two different ways.
The first one would be to group by symbol and just get the FIRST
value for every column, as in
select symbol, first(timestamp) as timestamp, first(side) as side, first(price) as price, first(amount) as amount from trades
where timestamp in today();
Note in QuestDB data is implicitly sorted by increasing timestamp, so no need to do an ORDER BY
before doing the GROUP BY
.
This solution should work fine and be efficient. The only downside to me is that you need
to write a lot of FIRST..AS
to aggregate and to then get the columns with the right name.
An alternative I sometimes use when I have many columns is to use a window function. It is slower, so I tend to use it only when querying a small part of my data. Since our query is scoped to today()
we should be OK with this alternative.
WITH ranked_trades AS (
SELECT
*,
row_number() OVER (PARTITION BY symbol ORDER BY timestamp) AS rn
FROM trades
WHERE timestamp IN today()
)
SELECT *
FROM ranked_trades
WHERE rn = 1;
This query would first find the position number (as rn
) for each row, relative to rows with the same symbol, and then on a second pass it would filter only the ones with rn
1. It performs slower than the GROUP BY
as it needs to go over all the rows to calculate the row number and then discard all subsequent rows on the second pass, but on the plus side you can just do select *
and write the query way faster.
I tend to use this second approach when I am doing interactive things on the web console, and the first one when I am automating a dashboard or anything that will run the query frequently.
Upvotes: 0