Benchmark insert ratio on TimescaleDB vs PostgreSQL

Question

The first thing to do was to create a table insertion_test. Then I convert it to a hypertable with:

SELECT create_hypertable('insertion_test', 'timestamp1');

I tried to benchmark insertion with importing 10 files(every file contains 100000 rows)with timestamp1 from 2010-01-01 00:05:00 to 2020-01-01 00:05:00 and the result is that PostgreSQL was way faster. Why does this happening?

Mike Freedman · Accepted Answer

please read this section about "best practices" for defining the range of chunks:

https://docs.timescale.com/timescaledb/latest/how-to-guides/hypertables/best-practices/

In short: Your chunks are much too small, leading to excessive overhead on the database during a bulk insert, which decreases performance for a (small) insert compared to a single table.

The database default is 1 week (you can specify as an optional arg to create_hypertable), which means that if you are inserting a 100,000 row file from 2010 to 2020, then it's creating ~520 chunks just for those 100,000, or less than 200 rows per chunk. Even with 10 files that's still under 2000 rows per chunk. We typically recommend on the order of millions rows per chunk.

Additionally, by loading 10 files with the same time range, you are causing it to "start at the beginning of time" with the second insert. Note that given your total data volume is small -- 1 million rows -- it won't ever reach the "disk thrashing" aspect we talked about, but something to consider for larger inserts:

https://blog.timescale.com/blog/13-tips-to-improve-postgresql-insert-performance/

Benchmark insert ratio on TimescaleDB vs PostgreSQL

Answers (1)

Related Questions