Sim
Sim

Reputation: 97

Timescale multidimension partitioning

I created a multi dimensional hypertable with the id as a 2nd dimension :

SELECT create_hypertable(
    '<table>',
    '<tstamp>',
    'id',
    1080,
    migrate_data => true,
    chunk_time_interval => INTERVAL '6 hours'
);

I have 1080 distinct ids in my table. My expectation was to find 1080 chunks, each with all the values for a single id in them. But for some reason, I find only 686 chunks. And When I explore the chunks, I see some with 1, 2 or 3 ids.

Is there something I dont understand ?

Upvotes: 1

Views: 846

Answers (2)

Ivan Sveshnikov
Ivan Sveshnikov

Reputation: 384

It's because you are using 'space partitioning', and it's not going to work the way you want.

If you want different partition for each id, you can use this approach:

SELECT create_hypertable(
    '<table>',
    '<tstamp>',
    migrate_data => false, -- you will need an empty hypertable for the next step
    chunk_time_interval => INTERVAL '6 hours'
);

SELECT add_dimension(
    '<table>',
    'id',
    chunk_time_interval => 1
);

Now you have two-dimensional partitioning, where each distinct id will go to separate partition, and these partitions will be split by time.

Note: the id column must be integer, otherwise it will not work.

Upvotes: 1

Umut TEKİN
Umut TEKİN

Reputation: 962

It does not mean you are going to have 1080 chunks. The number of chunks you will have will chance depending on the timestamp values in your table. The number_partitions means that in a specific timestamp range you can have upmost 1080 different hash partition ranges. Hash partition does not guarantee that you will have exactly the same number of partitions with the number you specified. Different values might end up with in the same partition.

In order to check number of chunks and to show example;

postgres=# insert into testtimescale  select x.i%2160 , '2023-06-18 13:00:00+00'::timestamp, x.i as name from generate_series(1,10800) x(i);
INSERT 0 10800
postgres=# select count(*) from timescaledb_information.chunks where range_start = '2023-06-18 12:00:00+00' and range_end = '2023-06-18 18:00:00+00';
 count
-------
   932
(1 row)

postgres=# insert into testtimescale  select x.i%1234 , '2023-06-18 13:00:00+00'::timestamp, x.i as name from generate_series(1,10800) x(i);
INSERT 0 10800
postgres=# select count(*) from timescaledb_information.chunks where range_start = '2023-06-18 12:00:00+00' and range_end = '2023-06-18 18:00:00+00';
 count
-------
   932
(1 row)

postgres=# insert into testtimescale  select x.i%5000 , '2023-06-18 13:00:00+00'::timestamp, x.i as name from generate_series(1,10800) x(i);
INSERT 0 10800
postgres=# select count(*) from timescaledb_information.chunks where range_start = '2023-06-18 12:00:00+00' and range_end = '2023-06-18 18:00:00+00';
 count
-------
  1068
(1 row)

postgres=# insert into testtimescale  select x.i%7000 , '2023-06-18 13:00:00+00'::timestamp, x.i as name from generate_series(1,10800) x(i);
INSERT 0 10800
postgres=# select count(*) from timescaledb_information.chunks where range_start = '2023-06-18 12:00:00+00' and range_end = '2023-06-18 18:00:00+00';
 count
-------
  1077
(1 row)

postgres=# insert into testtimescale  select x.i%10000 , '2023-06-18 13:00:00+00'::timestamp, x.i as name from generate_series(1,108000) x(i);
INSERT 0 108000
postgres=# select count(*) from timescaledb_information.chunks where range_start = '2023-06-18 12:00:00+00' and range_end = '2023-06-18 18:00:00+00';
 count
-------
  1080
(1 row)

As you can see above when new data loaded into to id column new chunks will be created, but it cannot exceed 1080.

Upvotes: 1

Related Questions