Reputation: 473
I tried an insert query performance test. Here are the numbers :-
Postgres
Insert : Avg Execution Time For 10 inserts of 1 million rows : 6260 ms
Timescale
Insert : Avg Execution Time For 10 inserts of 1 million rows : 10778 ms
Insert Queries:
-- Testing SQL Queries
--Join table
CREATE TABLE public.sensors(
id SERIAL PRIMARY KEY,
type VARCHAR(50),
location VARCHAR(50)
);
-- Postgres table
CREATE TABLE sensor_data (
time TIMESTAMPTZ NOT NULL,
sensor_id INTEGER,
temperature DOUBLE PRECISION,
cpu DOUBLE PRECISION,
FOREIGN KEY (sensor_id) REFERENCES sensors (id)
);
CREATE INDEX idx_sensor_id
ON sensor_data(sensor_id);
-- TimescaleDB table
CREATE TABLE sensor_data_ts (
time TIMESTAMPTZ NOT NULL,
sensor_id INTEGER,
temperature DOUBLE PRECISION,
cpu DOUBLE PRECISION,
FOREIGN KEY (sensor_id) REFERENCES sensors (id)
);
SELECT create_hypertable('sensor_data_ts', 'time');
-- Insert Data
INSERT INTO sensors (type, location) VALUES
('a','floor'),
('a', 'ceiling'),
('b','floor'),
('b', 'ceiling');
-- Postgres
EXPLAIN ANALYSE
INSERT INTO sensor_data (time, sensor_id, cpu, temperature)
SELECT
time,
sensor_id,
random() AS cpu,
random()*100 AS temperature
FROM generate_series(now() - interval '125 week', now(), interval '5 minute') AS g1(time), generate_series(1,4,1) AS g2(sensor_id);
-- TimescaleDB
EXPLAIN ANALYSE
INSERT INTO sensor_data_ts (time, sensor_id, cpu, temperature)
SELECT
time,
sensor_id,
random() AS cpu,
random()*100 AS temperature
FROM generate_series(now() - interval '125 week', now(), interval '5 minute') AS g1(time), generate_series(1,4,1) AS g2(sensor_id);
Am I overlooking any optimizations ?
Upvotes: 1
Views: 3180
Reputation: 11
With TimescaleDB version 1.7 running on Docker was able to insert around 600,000 rows per second on my laptop using https://github.com/vincev/tsdbperf:
$ ./tsdbperf --workers 8 --measurements 200000
[2020-11-03T20:25:48Z INFO tsdbperf] Number of workers: 8
[2020-11-03T20:25:48Z INFO tsdbperf] Devices per worker: 10
[2020-11-03T20:25:48Z INFO tsdbperf] Metrics per device: 10
[2020-11-03T20:25:48Z INFO tsdbperf] Measurements per device: 200000
[2020-11-03T20:26:15Z INFO tsdbperf] Wrote 16000000 measurements in 26.55 seconds
[2020-11-03T20:26:15Z INFO tsdbperf] Wrote 602750 measurements per second
[2020-11-03T20:26:15Z INFO tsdbperf] Wrote 6027500 metrics per second
Upvotes: 1
Reputation: 1912
By default, a hypertable creates a chunk per week (that's configurable in the create_hypertable
call). So with the above setting, you created 125 chunks for TimescaleDB, each with 8000 rows. There is overhead to this chunk creation, as well as the logic handling this. So with the dataset being so small, you are seeing the overhead of this chunk creation, which typically is amortized over much larger datasets: In most "natural" settings, we'll typically see on the order of millions+ (or at least 100,000s) of rows per chunk.
The place you start to see insert performance differences between a partitioned architecture like TimescaleDB and single table is also when the dataset (and particular, the indexes that you are currently maintaining) don't naturally fit in the memory.
In the above, 1M rows easily fit in memory, and the only index you have on your vanilla PG table is for sensor_id, so it's pretty tiny. (On the TimescaleDB hypertable, you by default has indexes on timestamps, distinct per chunk, so you actually have 125 indexes, each of size 8000 given the distinct timestamps).
For visual, see this older blog post: https://blog.timescale.com/blog/time-series-data-why-and-how-to-use-a-relational-database-instead-of-nosql-d0cd6975e87c/#result-15x-improvement-in-insert-rate
Note inserts to single PG table is ~same at the beginning, but then falls off as the table gets bigger and data/indexes start swapping to disk.
If you want to do larger performance tests, might suggest trying out the Time Series Benchmark Suite: https://github.com/timescale/tsbs
Upvotes: 5