ankitj
ankitj

Reputation: 473

When comparing insert performance between postgres and timescaledb, timescaledb didn't perform that well?

I tried an insert query performance test. Here are the numbers :-

Postgres

Insert : Avg Execution Time For 10 inserts of 1 million rows : 6260 ms

Timescale

Insert : Avg Execution Time For 10 inserts of 1 million rows : 10778 ms

Insert Queries:

-- Testing SQL Queries

--Join table

CREATE TABLE public.sensors(
  id SERIAL PRIMARY KEY,
  type VARCHAR(50),
  location VARCHAR(50)
);


-- Postgres table
CREATE TABLE sensor_data (
  time TIMESTAMPTZ NOT NULL,
  sensor_id INTEGER,
  temperature DOUBLE PRECISION,
  cpu DOUBLE PRECISION,
  FOREIGN KEY (sensor_id) REFERENCES sensors (id)
);

CREATE INDEX idx_sensor_id
ON sensor_data(sensor_id);

-- TimescaleDB table
CREATE TABLE sensor_data_ts (
  time TIMESTAMPTZ NOT NULL,
      sensor_id INTEGER,
  temperature DOUBLE PRECISION,
  cpu DOUBLE PRECISION,
  FOREIGN KEY (sensor_id) REFERENCES sensors (id)
);
SELECT create_hypertable('sensor_data_ts', 'time');

    
-- Insert Data

INSERT INTO sensors (type, location) VALUES
('a','floor'),
('a', 'ceiling'),
('b','floor'),
('b', 'ceiling');


-- Postgres 
EXPLAIN ANALYSE
INSERT INTO sensor_data (time, sensor_id, cpu, temperature)
SELECT
  time,
  sensor_id,
  random() AS cpu,
  random()*100 AS temperature
FROM generate_series(now() - interval '125 week', now(), interval '5 minute') AS g1(time), generate_series(1,4,1) AS g2(sensor_id); 


-- TimescaleDB
EXPLAIN ANALYSE
INSERT INTO sensor_data_ts (time, sensor_id, cpu, temperature)
SELECT
  time,
  sensor_id,
  random() AS cpu,
  random()*100 AS temperature
FROM generate_series(now() - interval '125 week', now(), interval '5 minute') AS g1(time), generate_series(1,4,1) AS g2(sensor_id);

Am I overlooking any optimizations ?

Upvotes: 1

Views: 3180

Answers (2)

Vincent
Vincent

Reputation: 11

With TimescaleDB version 1.7 running on Docker was able to insert around 600,000 rows per second on my laptop using https://github.com/vincev/tsdbperf:

$ ./tsdbperf --workers 8 --measurements 200000
[2020-11-03T20:25:48Z INFO  tsdbperf] Number of workers:       8
[2020-11-03T20:25:48Z INFO  tsdbperf] Devices per worker:      10
[2020-11-03T20:25:48Z INFO  tsdbperf] Metrics per device:      10
[2020-11-03T20:25:48Z INFO  tsdbperf] Measurements per device: 200000
[2020-11-03T20:26:15Z INFO  tsdbperf] Wrote  16000000 measurements in 26.55 seconds
[2020-11-03T20:26:15Z INFO  tsdbperf] Wrote    602750 measurements per second
[2020-11-03T20:26:15Z INFO  tsdbperf] Wrote   6027500 metrics per second

Upvotes: 1

Mike Freedman
Mike Freedman

Reputation: 1912

By default, a hypertable creates a chunk per week (that's configurable in the create_hypertable call). So with the above setting, you created 125 chunks for TimescaleDB, each with 8000 rows. There is overhead to this chunk creation, as well as the logic handling this. So with the dataset being so small, you are seeing the overhead of this chunk creation, which typically is amortized over much larger datasets: In most "natural" settings, we'll typically see on the order of millions+ (or at least 100,000s) of rows per chunk.

The place you start to see insert performance differences between a partitioned architecture like TimescaleDB and single table is also when the dataset (and particular, the indexes that you are currently maintaining) don't naturally fit in the memory.

In the above, 1M rows easily fit in memory, and the only index you have on your vanilla PG table is for sensor_id, so it's pretty tiny. (On the TimescaleDB hypertable, you by default has indexes on timestamps, distinct per chunk, so you actually have 125 indexes, each of size 8000 given the distinct timestamps).

For visual, see this older blog post: https://blog.timescale.com/blog/time-series-data-why-and-how-to-use-a-relational-database-instead-of-nosql-d0cd6975e87c/#result-15x-improvement-in-insert-rate

Note inserts to single PG table is ~same at the beginning, but then falls off as the table gets bigger and data/indexes start swapping to disk.

If you want to do larger performance tests, might suggest trying out the Time Series Benchmark Suite: https://github.com/timescale/tsbs

Upvotes: 5

Related Questions