Reputation: 3358
My data set is quite similar to stock prices - I get an average price, symbol, company name, every 5 minutes.
What I need to do is be able to efficiently:
Currently, this is what I have so far:
Stock Table
ID | name | symbol
Unique between name and symbol (name/symbol can be unique on their own)
Indexed on name, symbol (irrelevant here, but indexed for text search)
Stock Ticks Table
ID | stock_id | price | updated_at
All columns non null
For question 1 (Get latest price for a given symbol), I'm already having a bit of trouble - this is essentially similar to other questions:
Get latest comments for each post or essentially a greatest-n-per-group query. Thing is my data is going to get really large (every 5 minutes) so I think this is a good case for preoptimization. Should I add a current_price
column (or current_prices
table)? Or is a group by
/distinct
? How would that query be efficiently written?
For question 2 (Get the time intervaled prices) I'm honestly a bit troubled how to write a query for that. Note that the tricky part is there might be holes in the data, so for example, if the query is:
get every price per day from June 1 to June 10
and say there's no data from June 3, then it should try and find the closest time from that (either past or present)
I am writing this on Phoenix/Ecto, so if you could write it with the ORM that would be a plus but not required.
Upvotes: 0
Views: 62
Reputation: 754
Assuming PostgreSQL v9.6 (you didn't specify).
The stock table definition
CREATE TABLE stock ( id serial NOT NULL PRIMARY KEY,
name text NOT NULL UNIQUE,
symbol text NOT NULL UNIQUE );
and the price table definition
CREATE TABLE pricing ( id int NOT NULL REFERENCES stock (id),
updated_at TIMESTAMP(0) NOT NULL,
price NUMERIC( 10, 2 ) NOT NULL,
PRIMARY KEY (id, updated_at) );
And to accelerate price lookups by date without stock_ids
CREATE INDEX ON pricing (updated_at);
Example stock
values are (1, 'Queen', 'BEE'), (2, 'Team Fox', 'FOX').
Example pricing
values are
(1, '2017-06-17 13:24:59', 12.34),
(1, '2017-06-01 18:00:00', 6.10),
(1, '2017-06-02 17:00:00', 6.20),
(1, '2017-06-03 17:00:00', 6.30),
(2, '2017-06-02 15:00:00', 100.00),
(2, '2017-06-03 15:30:00', 777.00);
To get the latest prices for all symbols
SELECT s.*,
(SELECT price
FROM pricing
WHERE id = s.id
ORDER BY updated_at DESC
LIMIT 1) "latest_price"
FROM stock s
WHERE EXISTS (SELECT id FROM pricing p WHERE p.id = s.id);
which will exclude NULL
values for the latest_price
using the WHERE EXISTS
condition. Leave it out to get nulls for stocks for which no price is known yet.
An alternative to
get the latest prices
is
SELECT *
FROM stock s
JOIN LATERAL
(SELECT price
FROM pricing p
WHERE p.id = s.id
ORDER BY updated_at DESC
LIMIT 1) latest_price ON true ;
And to
get every price per day from June 1 to June 10
generate the appropriate dates first with
WITH RECURSIVE dates (d) AS (
SELECT '2017-06-01 20:00'::timestamp
UNION ALL
SELECT d + interval '24 hours'
FROM dates
WHERE d < '2017-06-04 20:00'::timestamp
)
SELECT d FROM dates ;
which produces
d
---------------------
2017-06-01 20:00:00
2017-06-02 20:00:00
2017-06-03 20:00:00
2017-06-04 20:00:00
You can adapt the following parameters
'2017-06-01 20:00'::timestamp
)'2017-06-04 20:00'::timestamp
) interval '24 hours'
)With the dates use
WITH RECURSIVE dates (d) AS (
SELECT '2017-06-01 20:00'::timestamp
UNION ALL
SELECT d + interval '24 hours'
FROM dates
WHERE d < '2017-06-04 20:00'::timestamp
)
SELECT symbol, CAST( d AS date ) "day", price, updated_at
FROM stock s
CROSS JOIN dates
JOIN LATERAL (
SELECT updated_at, price
FROM pricing p
WHERE p.id = s.id AND p.updated_at <= d
ORDER BY p.updated_at DESC
LIMIT 1
) latest_prices ON true
ORDER BY 1, 2, 4, 3 ;
-- orders same id = name blocks;
-- use 2, 1, 4, 3 to get same date blocks
to get for example
symbol | day | price | updated_at
--------+----------------+--------+---------------------
FOX | 2017-06-02 | 100.00 | 2017-06-02 15:00:00
FOX | 2017-06-03 | 777.00 | 2017-06-03 15:30:00
FOX | 2017-06-04 | 777.00 | 2017-06-03 15:30:00 -- price from 3rd
BEE | 2017-06-01 | 6.10 | 2017-06-01 18:00:00
BEE | 2017-06-02 | 6.20 | 2017-06-02 17:00:00
BEE | 2017-06-03 | 6.30 | 2017-06-03 17:00:00
BEE | 2017-06-04 | 6.30 | 2017-06-03 17:00:00 -- price from 3rd
Upvotes: 1