Reputation: 13
I am trying out the timescaledb to store price data. The price table schema is as follows
CREATE TABLE prices(
time TIMESTAMPTZ NOT NULL,
pid VARCHAR(1024) NOT NULL,
price DOUBLE PRECISION NOT NULL
)
We need to keep the price data for most 183 days. The current database we use for the price data is mongodb. We have already 9 billion records in mongodb that is really slow to insert and remove records over 183 days.
The query is pretty simple. Given the pid
and the date range, for example, 7das, the query returns a tuple containing the average price, max price and min price. The RPS for query is around 20.
Plus, we daily curate around 30 million price records that will be inserted into mongodb daily in the early morning around 2 AM. It takes at least 4 hours to complete the insertion.
Looking and testing out timescaledb for such a use case, I, of course, enabled hyepertable and set interval chunk for 24 hours and also create an index on pid
. This is what I have been done for far to complete the insertion of 30 million records within 20 minutes with python psycopg2.copy_from
. Would like to know what other caveats or suggestions for the case I am encountering in terms of planning out timescaledb in an optimal way?
p.s. I did try influxdb but it didn't really perform well when series cardinality is over a million or so.
Upvotes: 1
Views: 929
Reputation: 1912
(Timescale co-founder)
A few suggestions:
pid, timestamp desc
, not just pid.https://docs.timescale.com/latest/using-timescaledb/continuous-aggregates
https://docs.timescale.com/latest/using-timescaledb/compression
Also, welcome you to join the TimescaleDB community on slack for these types of questions: https://slack.timescale.com/
Upvotes: 4