Reputation: 201
This table started out at short term storage for meter data before it was going to be validated and added to some long term storage tables.
Turns out the clients wants to keep this data for a long time since we saved it and it is growing fast.
create table metering_meterreading
(
id bigserial not null. # Primary Key
created_at timestamp with time zone not null,
updated_at timestamp with time zone not null,
timestamp timestamp with time zone not null, # BTREE index
value numeric(15, 3) not null,
meter_device_id uuid not null, # FK to meter_device, BTREE index
series_id uuid not null # FK to series, BTREE index
organization_id uuid not null. # FK to org , BTREE index
);
I am planning on dropping the primary key since (org_id, meter_device_id, series_id, timestamp) makes it unique. It was just added by my ORM (django) and I didn't care when we started.
But since I pretty much always want to filter in organization, meter_device, and series to get a range of time series data I am wondering if it would be more efficient to have a multicolumn index on (organization_id, meter_device_id, series_id, timestamp) instead of the separate indexes.
I read somewhere that if I had a range it should be the rightmost in the index.
This is still not an super efficient table for timeseries data, since it will grow large, but I am planning in fixing that by partitioning on range, or maybe even use Timescale. But before partitioning I would like it to be as efficient as possible to look up data in it.
I also saw an example somewhere that used a separate table to identify the metric:
create table metric
(
id
organization_id
meter_device_id
series_id
) UNIQE (organization_id, meter_device_id, series_id)
;
create table metering_meterreading
(
metric_id. bigserial, FK to metric, BTREE index
timestamp timestamp with time zone not null, # BTREE index
value numeric(15, 3) not null,
created_at timestamp with time zone not null,
updated_at timestamp with time zone not null,
);
But I am not sure if that is actually better than just putting them all in table. It might impact ingestion rate since there is another table involved now.
Upvotes: 1
Views: 393
Reputation: 246393
If (org_id, meter_device_id, series_id, timestamp)
uniquely determine a table row, you need to use a multi-column primary key over all of them. So you automatically have a 4-column index on these columns. Just make sure that timestamp
is last in the list, then that index will support your query ideally.
Upvotes: 2