Y Y
Y Y

Reputation: 513

Which column for foreign key: id or any other column and why?

TL;DR

  1. Should a foreign key always refer to the id column of another table? Why or why not? Is there a standard rule for this?
  2. Is there a cost associated with using any other unique column other than id column for foreign key? Performance / storage? How significant? Is it frowned in the industry?

Example: this is the schema for my sample problem:

enter image description here

In my schema sometimes I use id column as the foreign key and sometimes use some other data column.

Which is the correct way?

On a side note, I don't have id columns for the garage_level, garage_spaceid, vehicle_garage_status and vehicle_parking_status tables because they only have one column which is the primary key and the data they store is just at most 15 rows in each table and it is probably never going to change. Should I still have an id column in those ?

Upvotes: 1

Views: 4836

Answers (4)

Alex Yu
Alex Yu

Reputation: 3547

OP question is about "correct way".

I will try to provide some kind of summary from existing comments and answers, general DO and general DONT for FKs.

What was already said

A. "A foreign key has to target a primary key or unique constraint"

Literally from Laurenz Albe answer and it was noted in comments

B. "stick with whatever you think will change the least"

It was noted by Adrian Klavier in comments.

Notes

There is no such general rule that PK or unique constraint must be defined on a single column.

So the question title itself must be corrected: "Which column(s) for foreign key: id or any other column(s) and why?"

Let's talk about "why".

Why: General DO, general DONT and an advice

Is there a cost associated with using any other unique column other than id column for foreign key? Performance / storage? How significant? Is it frowned in the industry?

General DO: Analyze requirements, use logic, use math (arithmetics is enough usually). There is no a single database design that's always good for all cases. Always ask yourself: "Can it be improved?". Never be content with design of existing FKs, if requirements changed or DBMS changed or storage options changed - revise design.

General DONT: Don't think that there is a single correct rule for all cases. Don't think: "if that worked in that database/table than it will work for this case too".

Let me illustrate this points with a common example.

Example: PK on id uuid field

  1. We look into database and see a table has a unique constraint on two fields of types integer (4 bytes) + date (4 bytes)
  2. Additionally: this table has a field id of uuid type (16 bytes)
  3. PK is defined on id
  4. All FKs from other tables are targeting id field

It this a correct design or not?

Case A. Common case - not OK

Let's use math:

  • unique constraint on int+date: it's 4+4=8 bytes
  • data is never changed
  • so it's a good candidate for primary key in this table
  • and nothing prevents to use it for foreign keys in related tables

So it looks like additional 16 bytes per each row + indexes costs is a mistake.

And that's a very common mistake especially in combination of MSSQL + CLUSTERED indexes on random uuids

Is it always a mistake?

No.

Consider latter cases.

Case B. Distributed system - OK

Suppose that you have a distributed system:

  • ServerA, ServerB, ServerC are sources of data
  • HeadServer - is data aggregator
  • data on serverA-ServerC could be duplicated: the same record could exists on several instances
  • aggregated data must not have duplicates
  • data for related tables can come from different instances: data for table with PK from serverA and data for tables with FKs from serverB-serverC
  • you need to log from where each record is originated

In such case existence of PK on id uuid is justified:

  • unique constraint allows to deduplicate records
  • surrogate key allows related data come from different sources

Case C. 'id' is used to expose data through API - OK

Suppose that you have an API to access data for external consumers.

There is a good unique constraint on:

  • client_id: incrementing integer in range 1..100000
  • invoice_date: dates '20100101'..'20210901'

And a surrogate key on id with random uuids.

You can create external API in forms:

  • /server/invoice/{client_id}/{invoice_date}
  • /server/invoice/{id}

From security POV /{id} is superior by reasons:

  • it's impossible to deduce from one uuid value existence of other
  • it's easier to implement authorization system for entities of different types. E.g. entityA has natural key on int, entityB on bigint' and entityC on int+ byte+date`

In such case surrogate key not only justified but becames essential.

Afterword

I hope that I was clear in explanation of main correct principle: "There is no such thing as a universal correct principle".

An additional advice: avoid CASCADE UPDATE/DELETEs: Although it depends on DBMS you use.

But in general :

  • "explicit is better than implicit"
  • CASCADEs rarely works as intended
  • when CASCADES works - usually they have performance problems

Thank you for your attention.

I hope this helps somebody.

Upvotes: 1

SAI PAVAN
SAI PAVAN

Reputation: 64

A foreign key is a field or a column that is used to establish a link between two tables. A FOREIGN KEY is a column (or collection of columns) in one table, that refers to the PRIMARY KEY in another table.

There is no rule that it should refer to a id column but the column it refers to should be the primary key. In real scenarios, it usually refers to Id column as in most cases it is the primary key in the tables.

Upvotes: 1

Pratyush Arora
Pratyush Arora

Reputation: 135

A foreign key is basically a column of a different table(it is always of a different table, since that is the role it serves). It is used to join/ get data from a different table. Think of it like say school is a database and there are many different table for different aspects of student.
say by using Admission number 1234, from accounts table you can get the fees and sports table you can get the sports he play.
Now there is no rule that foreign key should be id column, you can keep it whatever you want. But,to use foreign key you should have a matching column in both tables therefore usually id column is only used. As I stated in the above example the only common thing in say sports table and accounts table would be admission number.

 admn_no | sports     |
+---------+------------+
|    1234 | basketball


+---------+---------+
| admn_no | fees    |
+---------+---------+
|    1234 | 1000000 |
+---------+---------+

Now say using the query\

select * from accounts join sports using (admn_no);

you will get:

+---------+---------+------------+
| admn_no | fees    | sports     |
+---------+---------+------------+
|    1234 | 1000000 | basketball |
+---------+---------+------------+

PS: sorry for bad formatting

Upvotes: 1

Laurenz Albe
Laurenz Albe

Reputation: 246578

A foreign key has to target a primary key or unique constraint. It is normal to reference the primary key, because you typically want to reference an individual row in another table, and the primary key is the identifier of a table row.

From a technical point of view, it does not matter whether a foreign key references the primary key or another unique constraint, because in PostgreSQL both are implemented in the same way, using a unique index.

As to your concrete examples, there is nothing wrong with having the unique size column of vehicle_size be the target of a foreign key, although it begs the question why you didn't make size the primary key and omit the id column altogether. There is no need for each table to have an id column that is the automatically generated numeric primary key, except that there may be ORMs and other software that expect that.

Upvotes: 4

Related Questions