Reputation: 83576
I have some time-series data that I am about to import into TimescaleDB, as (time, item_id, value) tuples in a hypertable.
I have created an index:
CREATE INDEX ON time_series (item_id, timestamp DESC);
Does TimescaleDB have different performance characteristics when inserting rows in the middle of time series vs. appending them at the end of the time
? I know this is an issue for some of native PostgreSQL data structures like BRIN indexes.
I am asking because for some item_id
s I might have patchy data and I need to insert those values after other item_id
s have filled the tip of time series. Basically, some items might be old data that is seriously behind the rest of the items.
Upvotes: 3
Views: 1093
Reputation: 24603
I don't think It reacts differently,
in your case the insert performance will be depends on
but this tip is going to help you the best
If a row with a sufficiently older timestamp is inserted – i.e., it's an out-of-order or backfilled write – the disk pages corresponding to the older chunk (and its indexes) will need to be read in from disk. This will significantly increase write latency and lower insert throughput.
Particularly, when you are loading data for the first time, try to load data in sorted, increasing timestamp order.
Be careful if you're bulk loading data about many different servers, devices, and so forth:
Do not bulk insert data sequentially by server (i.e., all data for server A, then server B, then C, and so forth). This will cause disk thrashing as loading each server will walk through all chunks before starting anew.
Instead, arrange your bulk load so that data from all servers are inserted in loose timestamp order (e.g., day 1 across all servers in parallel, then day 2 across all servers in parallel, etc.)
source: TimeScale blog
Upvotes: 3