Reputation: 16455
I'm relatively new to designing a database, and I'm wondering what the canonical way to implement a many to many relationship between rows in the same table is.
In my case I have a table of formulas and I want to say that two formulas in the table are related:
The formulas table:
formula_id SERIAL PRIMARY KEY
name TEXT NOT NULL
formula TEXT NOT NULL
I assume I would make a new table called related_formulas and then do something like:
formula_relation_id SERIAL PRIMARY KEY
formula_id INT REFERENCES formulas (formula_id) ON DELETE CASCADE
formula_id2 INT REFERENCES formulas (formula_id) ON DELETE CASCADE
but then I foresee problems such as preventing two ids in the same row from having the same value. I'm sure there are also other potential problems that I don't see due to my own inexperience.
Could someone point me in the right direction?
Upvotes: 0
Views: 4221
Reputation: 864
Amadan's answer is good to ensure the data is inserted in one canonical way, however if you prefer not to restrict your db users to a specific order when inserting the formulas (which was imposed by CHECK (formula1_id < formula2_id)
in Amadan's answer), you can consider:
CREATE TABLE formula (
id int PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY,
formula_name text NOT NULL,
formula text NOT NULL
);
CREATE TABLE formula_formula_relation (
id int PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY,
formula1_id int NOT NULL REFERENCES formula ON DELETE CASCADE,
formula2_id int NOT NULL REFERENCES formula ON DELETE CASCADE,
CHECK (formula1_id <> formula2_id),
CONSTRAINT already_related_formulas_not_allowed_again EXCLUDE USING gist (
LEAST (formula1_id, formula2_id) WITH =,
GREATEST (formula1_id, formula2_id) WITH =
)
);
(You might need to run CREATE EXTENSION btree_gist;
)
Upvotes: 1
Reputation: 198526
From SERIAL
I assumed PostgreSQL...
CREATE TABLE formula_relation (
formula_relation_id SERIAL PRIMARY KEY,
formula1_id INT REFERENCES formulas (formula_id) ON DELETE CASCADE,
formula2_id INT REFERENCES formulas (formula_id) ON DELETE CASCADE,
CHECK (formula1_id < formula2_id)
);
(I also assumed that your relation is symmetric, so i
being related to A[i]
also implies A[i]
is related to i
; thus, having formula1_id < formula2_id
ensures there can only be one, canonical, variant of the row, and you don't need to check for the reverse pairing. If the relation is not symmetric, you should just CHECK (formula1_id != formula2_id)
.)
Upvotes: 5