kevinjom
kevinjom

Reputation: 353

Postgres - very slow to create EXCLUDE constraint

We have a table foo, its schema looks like the following

hi=# \d foo
                       Table "public.foo"
   Column   |           Type           |       Modifiers
------------+--------------------------+------------------------
 id         | uuid                     | not null
 bar_id     | uuid                     | not null
 hi         | character varying(128)   | not null
 yo         | character varying(4000)  |
 updated_at | timestamp with time zone | not null default now()
 created_at | timestamp with time zone | not null default now()
Indexes:
    "foo$pk" PRIMARY KEY, btree (id)
    "foo$uk" UNIQUE CONSTRAINT, btree (bar_id, hi, yo)
Foreign-key constraints:
    "foo$bar$fk" FOREIGN KEY (bar_id) REFERENCES bar(id)

We've got around 100M records inside it, as you can see, there is a UNIQUE constraint for this table, what we wanted to do is to replace it with EXCLUDE constraints due to business reasons. So the change we wanted to make show as below

ALTER TABLE foo ADD CONSTRAINT "foo$one$uk"
EXCLUDE ( bar_id WITH =, hi WITH =, yo WITH =) WHERE (hi = 'Tom') DEFERRABLE INITIALLY DEFERRED;
ALTER TABLE foo ADD CONSTRAINT "foo$two$uk"
EXCLUDE ( bar_id WITH =, hi WITH =) WHERE (hi = 'Lisa') DEFERRABLE INITIALLY DEFERRED;
ALTER TABLE foo DROP CONSTRAINT IF EXISTS "foo$uk";

Evidence showed that run these 3 statements on a AWS RDS instance with a (m3.large + 300GB general purpose SSD) took about 12 hours to finish. But we've also noticed that run the first statement consumed almost all the time, the second one was fast (within minutes), the third one just returned immediately. So I am wondering what happened behind the scene, why should that happen?

Upvotes: 4

Views: 1834

Answers (1)

Disillusioned
Disillusioned

Reputation: 14832

Whenever you add a constraint, existing data should be checked to ensure there are no existing constraint violations.

Exclusion constraints are described as, see 5.3.6:

ensure that if any two rows are compared on the specified columns or expressions using the specified operators.

So depending on how many rows have hi = 'Tom', you possibly performed an O(n2) operation over 100 million rows. Yes, that will take a while.

Note also that:

Adding an exclusion constraint will automatically create an index of the type specified in the constraint declaration.

This has some overhead, but not as much as comparing each pair of existing rows.


As for the 2nd constraint, I'm not certain but there are 2 possibilities why it constraint ran faster.

Either there are significantly less rows WHERE hi = 'Lisa' or the engine can leverage information from the fact that the previous constraint has been checked in order to check the new constraint more efficiently.

Obviously the 3rd change, dropping a constraint, doesn't need to check anything.


Side Note

You might have the option of disabling the constraint check when it's created. (I don't know if this is supported in PostgreSQL.)

  • This allows you to ignore existing constraint violations but ensure the constraint is checked going forward.
  • This would have the side-effect of speeding up the constraint creation quite significantly.
  • Of course this does also mean the constraint hasn't been "verified". Meaning the engine cannot "trust" the integrity of the constraint for any performance benefits that might otherwise be achievable.

Upvotes: 6

Related Questions