YiFei
YiFei

Reputation: 1834

Better to use foreign key or to assign unique ids?

A simplified model of the database is that, say I have a table of A, which has columns a, b, c, d (so that (a, b, c, d) is the primary key). Then I have another table B to store some list-like data for each entry in A, in order to stay with the first normal form.

This B table therefore, will have columns a, b, c, d, e, where each e entry is one element in the list. It is natural to have a foreign key constraint on (a, b, c, d) in B which enforces integrity that every thing must exist in A first then B.

But I wonder if the foreign key constraint will let the database engine to compress or to not duplicate the data storage in B? (In other words, will (a, b, c, d) be stored again verbatim and identical to what is in A?) If no, will assigning each entry in A a unique ID a better choice in this case?

Upvotes: 2

Views: 1608

Answers (2)

Rick James
Rick James

Reputation: 142298

Table Storage: Each MySQL table is stored completely separately. In some cases, two table may live in the same OS file, but the blocks (16KB for InnoDB) will be totally separate. Therefore, (a,b,c,d) shows up in at least 2 places in the dataset -- once in A and once in B.

A FOREIGN KEY has the side effect of creating an extra INDEX is there is not one already there. (In your case, you said it was the PK, so it is already an index.) Note that an FK does not need a UNIQUE index. (In your case, the PK is unique, but that seems irrelevant.)

A secondary index (as opposed to the PRIMARY KEY) for a table is stored in a separate BTree, ordered by the key column(s). So, if (a,b,c,d) had not already been indexed, the FK would lead to an extra copy of (a,b,c,d), namely in the secondary index.

There is one form of compression in InnoDB: You can declare a table to be ROW_FOMAT=COMPRESSED. But this has nothing to do with de-duplicating (a,b,c,d).

Four columns is a lot for a PK, but it is OK. If it is 4 SMALLINT values, then it is only 8 bytes (plus overhead) per row per copy of the PK. If it is a bunch of VARCHARs, then it could be much bulkier.

When should you deliberately add a surrogate id as the PK? In my experience, only about one-third of the cases. (Others will argue.) If you don't have any secondary keys, nor FKs referencing it, then the surrogate is a waste of space and speed. If you have only one secondary key or FK, then the required space is about the same. This last situation is what you described so far.

Table size: If you have a thousand rows, space is not likely to be an issue. A million rows might trigger thinking more seriously about space. For a billion rows, 'pull out all stops'.

PK tips: Don't include DATETIME or TIMESTAMP, someday there will need to be two rows with the same second. Don't put more columns in the PK than are needed for the implicit uniqueness constraint; if you do, you effectively lose that constraint. (There are exceptions.)

Upvotes: 1

nvogel
nvogel

Reputation: 25526

Most SQL-based database engines do require foreign key values to be physically stored at least twice (in the referencing table and in the parent table). It would be nice to have the option not to do that in the case of large foreign keys. Many database designers will choose to avoid large foreign keys, partly because they have this additional overhead.

Most DBMSs do provide the option to compress data - foreign key or not. In many cases that will probably more than compensate for the physical duplication of data due to a foreign key.

Foreign keys are a logical construct however, and in database design it's important to distinguish between logical and physical concerns.

Upvotes: 3

Related Questions