Reputation: 6318
I have a big database that represents a graph with a ton of data in it that is constantly growing. The database looks something like:
CREATE TABLE node (
id BIGSERIAL PRIMARY KEY,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
status TEXT NOT NULL DEFAULT 'new',
-- ... other columns ...
data JSONB
);
CREATE TABLE edge (
parent BIGINT REFERENCES node(id),
child BIGINT REFERENCES node(id)
);
-- ...
-- a few other tables with foreign keys to node
-- ...
Now that I have a ton of data in my database, certain queries are starting to bog down quite a bit so I would like to be able to move nodes that I want to "archive" to a separate table partition that wouldn't get read for any time-sensitive queries. I would still be able to query them easily for historical analysis though.
Ideally I would add an archived BOOL NOT NULL DEFAULT false
column to my node
table and then change it to be PARTITION BY LIST (archived)
.
Unfortunately, Postgres partioning won't let me have a uniqueness constraint on just node.id
, and without a uniqueness constraint I can't use foreign key REFERENCES
in my edge
table.
I could remove the REFERENCES
foreign key relationship and just live dangerously but I really don't love that.
I tried adding the archived
column and building a BRIN
index on it, but Postgres refuses to use it. I suppose I could CLUSTER node USING archived_idx
but finding outage time for the ACCESS EXCLUSIVE
lock it requires would be tough.
Are there any other strategies for partitioning data (or something similar) for tables that require foreign keys to it?
Upvotes: 2
Views: 151
Reputation: 127426
First of all, 40 million records is not a lot in itself. However, it can become a burden when these JSON objects are large, or most data is already archived.
You can improve this by generating the primary key on the partition and not on the parent:
CREATE TABLE node
(
id BIGINT GENERATED ALWAYS AS IDENTITY, -- use an identity
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
status TEXT NOT NULL DEFAULT 'new',
archived BOOLEAN NOT NULL DEFAULT FALSE,
-- ... other columns ...
data JSONB
) PARTITION BY LIST (archived);
CREATE TABLE node_current PARTITION OF node
(
CONSTRAINT pkey_node_current
PRIMARY KEY (id)
)
FOR VALUES IN (FALSE);
CREATE TABLE node_archived PARTITION OF node
(
CONSTRAINT pkey_node_archived
PRIMARY KEY (id)
)
FOR VALUES IN (TRUE);
CREATE TABLE edge
(
parent BIGINT REFERENCES node_current (id),
child BIGINT REFERENCES node_current (id)
);
There are other better solutions to track different versions of the record if you make updates. But that depends on your requirements.
Upvotes: 1