PostgreSQL with TimescaleDB only uses a single core during index creation

Question

we have a PostgreSQL hypertable with a few billion rows and we're trying to create a unique index on top of it like so:

CREATE UNIQUE INDEX device_data__device_id__value_type__timestamp__idx ON public.device_data(device_id, value_type, "timestamp" DESC);

We created the hypertable like this:

SELECT create_hypertable('device_data', 'timestamp');

Since we want to create the index as fast as possible, we'd like to parallelize the index creation, and followed this guide.

We tested various settings for work_mem, maintenance_work_mem, max_worker_processes, max_parallel_maintenance_workers, and max_parallel_workers. We also set the parallel_workers setting on our table: ALTER TABLE device_data SET (parallel_workers = 10);. But no matter what we do, the index creation always only uses a single core (we have 16 available), and therefore the creation takes very long.

Any idea what we might be missing here?

Our PostgreSQL version is 12.5 and the server runs Ubuntu 18.

davidk · Accepted Answer

Unfortunately, Timescale doesn't currently support parallel index creation. I would recommend filing a Github issue asking to support it. It is a bit of a heavy lift and might not get prioritized horribly quickly. I think another option that could be useful would be to take the https://docs.timescale.com/latest/api#create_index transaction_per_chunk option here and allow the user to control how the indexes are created, so a simple api that would create the index for all future chunks, but not on older chunks and then allow you to call create_index(chunk_name, ht_index_name) on all the chunks, then you could parallelize that operation in your own code. This ends up being a much simpler lift because the transactionality of the parallel index creation is the hardest part.

PostgreSQL with TimescaleDB only uses a single core during index creation

Answers (1)

Related Questions