rcv
rcv

Reputation: 6318

Postgres partitioning with a primary key

I have a big database that represents a graph with a ton of data in it that is constantly growing. The database looks something like:

CREATE TABLE node (
  id BIGSERIAL PRIMARY KEY,
  created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  status TEXT NOT NULL DEFAULT 'new',
  -- ... other columns ...
  data JSONB
);

CREATE TABLE edge (
  parent BIGINT REFERENCES node(id),
  child BIGINT REFERENCES node(id)
);

-- ...
-- a few other tables with foreign keys to node
-- ...

Now that I have a ton of data in my database, certain queries are starting to bog down quite a bit so I would like to be able to move nodes that I want to "archive" to a separate table partition that wouldn't get read for any time-sensitive queries. I would still be able to query them easily for historical analysis though.

Ideally I would add an archived BOOL NOT NULL DEFAULT false column to my node table and then change it to be PARTITION BY LIST (archived).

Unfortunately, Postgres partioning won't let me have a uniqueness constraint on just node.id, and without a uniqueness constraint I can't use foreign key REFERENCES in my edge table.

I could remove the REFERENCES foreign key relationship and just live dangerously but I really don't love that.

I tried adding the archived column and building a BRIN index on it, but Postgres refuses to use it. I suppose I could CLUSTER node USING archived_idx but finding outage time for the ACCESS EXCLUSIVE lock it requires would be tough.

Are there any other strategies for partitioning data (or something similar) for tables that require foreign keys to it?

Upvotes: 2

Views: 151

Answers (1)

Frank Heikens
Frank Heikens

Reputation: 127426

First of all, 40 million records is not a lot in itself. However, it can become a burden when these JSON objects are large, or most data is already archived.

You can improve this by generating the primary key on the partition and not on the parent:

CREATE TABLE node
(
    id         BIGINT GENERATED ALWAYS AS IDENTITY, -- use an identity
    created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    status     TEXT        NOT NULL DEFAULT 'new',
    archived   BOOLEAN     NOT NULL DEFAULT FALSE,
    -- ... other columns ...
    data       JSONB
) PARTITION BY LIST (archived);

CREATE TABLE node_current PARTITION OF node
    (
        CONSTRAINT pkey_node_current
            PRIMARY KEY (id)
        )
    FOR VALUES IN (FALSE);

CREATE TABLE node_archived PARTITION OF node
    (
        CONSTRAINT pkey_node_archived
            PRIMARY KEY (id)
        )
    FOR VALUES IN (TRUE);

CREATE TABLE edge
(
    parent BIGINT REFERENCES node_current (id),
    child  BIGINT REFERENCES node_current (id)
);

There are other better solutions to track different versions of the record if you make updates. But that depends on your requirements.

Upvotes: 1

Related Questions