Andreas Grech
Andreas Grech

Reputation: 107950

SQL: Do you need an auto-incremental primary key for Many-Many tables?

Say you have a Many-Many table between Artists and Fans. When it comes to designing the table, do you design the table like such:

ArtistFans
    ArtistFanID (PK)
    ArtistID (FK)
    UserID (FK)

 (ArtistID and UserID will then be contrained with a Unique Constraint 
  to prevent duplicate data) 

Or do you build use a compound PK for the two relevant fields:

ArtistFans
    ArtistID (PK)
    UserID (PK)

(The need for the separate unique constraint is removed because of the 
 compound PK)

Are there are any advantages (maybe indexing?) for using the former schema?

Upvotes: 17

Views: 8536

Answers (7)

Utku Özdemir
Utku Özdemir

Reputation: 7725

In my opinion, in pure SQL id column is not necessary and should not be used. But for ORM frameworks such as Hibernate, managing many-to-many relations is not simple with compound keys etc., especially if join table have extra columns.

So if I am going to use a ORM framework on the db, I prefer putting an auto-increment id column to that table and a unique constraint to the referencing columns together. And of course, not-null constraint if it is required.

Then I treat the table just like any other table in my project.

Upvotes: 1

gbn
gbn

Reputation: 432271

ArtistFans
    ArtistID (PK)
    UserID (PK)

The use of an auto incremental PK has no advantages here, even if the parent tables have them.

I'd also create a "reverse PK" index automatically on (UserID, ArtistID) too: you will need it because you'll query the table by both columns.

Autonumber/ID columns have their place. You'd choose them to improve certain things after the normalisation process based on the physical platform. But not for link tables: if your braindead ORM insists, then change ORMs...

Edit, Oct 2012

It's important to note that you'd still need unique (UserID, ArtistID) and (ArtistID, UserID) indexes. Adding an auto increments just uses more space (in memory, not just on disk) that shouldn't be used

Upvotes: 21

Isabelle Wedin
Isabelle Wedin

Reputation: 1365

Assuming that you're already a devotee of the surrogate key (you're in good company), there's a case to be made for going all the way.

A key point that is sometimes forgotten is that relationships themselves can have properties. Often it's not enough to state that two things are related; you might have to describe the nature of that relationship. In other words, there's nothing special about a relationship table that says it can only have two columns.

If there's nothing special about these tables, why not treat it like every other table and use a surrogate key? If you do end up having to add properties to the table, you'll thank your lucky presentation layers that you don't have to pass around a compound key just to modify those properties.

I wouldn't even call this a rule of thumb, more of a something-to-consider. In my experience, some slim majority of relationships end up carrying around additional data, essentially becoming entities in themselves, worthy of a surrogate key.

The rub is that adding these keys after the fact can be a pain. Whether the cost of the additional column and index is worth the value of preempting this headache, that really depends on the project.

As for me, once bitten, twice shy – I go for the surrogate key out of the gate.

Upvotes: 11

devio
devio

Reputation: 37215

Funny how all answers favor variant 2, so I have to dissent and argue for variant 1 ;)

To answer the question in the title: no, you don't need it. But...

Having an auto-incremental or identity column in every table simplifies your data model so that you know that each of your tables always has a single PK column.

As a consequence, every relation (foreign key) from one table to another always consists of a single column for each table.

Further, if you happen to write some application framework for forms, lists, reports, logging etc you only have to deal with tables with a single PK column, which simplifies the complexity of your framework.

Also, an additional id PK column does not cost you very much in terms of disk space (except for billion-record-plus tables).

Of course, I need to mention one downside: in a grandparent-parent-child relation, child will lose its grandparent information and require a JOIN to retrieve it.

Upvotes: 1

Andomar
Andomar

Reputation: 238086

Even if you create an identity column, it doesn't have to be the primary key.

ArtistFans
    ArtistFanId
    ArtistId (PK)
    UserId (PK)

Identity columns can be useful to relate this relation to other relations. For example, if there was a creator table which specified the person who created the artist-user relation, it could have a foreign key on ArtistFanId, instead of the composite ArtistId+UserId primary key.

Also, identity columns are required (or greatly improve the operation of) certain ORM packages.

Upvotes: 5

TheTXI
TheTXI

Reputation: 37895

The standard way is to use the composite primary key. Adding in a separate autoincrement key is just creating a substitute that is already there using what you have. Proper database normalization patterns would look down on using the autoincrement.

Upvotes: 1

Brian Campbell
Brian Campbell

Reputation: 332846

I cannot think of any reason to use the first form you list. The compound primary key is fine, and having a separate, artificial primary key (along with the unique contraint you need on the foreign keys) will just take more time to compute and space to store.

Upvotes: 2

Related Questions