databasedatabase-designentity-relationship

Reputation: 1689

Why does my database table need a primary key?

In my database I have a list of users with information about them, and I also have a feature which allows a user to add other users to a shortlist. My user information is stored in one table with a primary key of the user id, and I have another table for the shortlist. The shortlist table is designed so that it has two columns and is basically just a list of pairs of names. So to find the shortlist for a particular user you retrieve all names from the second column where the id in the first column is a particular value.

The issue is that according to many sources such as this Should each and every table have a primary key? you should have a primary key in every table of the database.

According to this source http://www.w3schools.com/sql/sql_primarykey.asp - a primary key in one which uniquely identifies an entry in a database. So my question is:

What is wrong with the table in my database? Why does it need a primary key?
How should I give it a primary key? Just create a new auto-incrementing column so that each entry has a unique id? There doesn't seem much point for this. Or would I somehow encapsulate the multiple entries that represent a shortlist into another entity in another table and link that in? I'm really confused.

Upvotes: 2

Answers (6)

user2429311

Reputation: 29

Create a composite primary key. To read more about what a composite primary key is, visit http://www.relationaldbdesign.com/relational-database-analysis/module2/concatenated-primary-keys.php

Upvotes: 0

Chris Travers

Reputation: 26454

Well since you are asking, it's good practice but in a few instances (no joins needed to the data) it may not be absolutely required. The biggest problem though is you never really know if requirements will change and so you really want one now so you aren't adding one to a 10m record table after the fact.....

In addition to a primary key (which can span multiple columns btw) I think it is good practice to have a secondary candidate key which is a single field. This makes joins easier.

First some theory. You may remember the definition of a function from HS or college algebra is that y = f(x) where f is a function if and only if for every x there is exactly one y. In this case, in relational math we would say that y is functionally dependent on x on this case.

The same is true of your data. Suppose we are storing check numbers, checking account numbers, and amounts. Assuming that we may have several checking accounts and that for each checking account duplicate check numbers are not allowed, then amount is functionally dependent on (account, check_number). In general you want to store data together which is functionally dependent on the same thing, with no transitive dependencies. A primary key will typically be the functional dependency you specify as the primary one. This then identifies the rest of the data in the row (because it is tied to that identifier). Think of this as the natural primary key. Where possible (i.e. not using MySQL) I like to declare the primary key to be the natural one, even if it spans across columns. This gets complicated sometimes where you may have multiple interchangeable candidate keys. For example, consider:

CREATE TABLE country (
    id serial not null unique,
    name text primary key,
    short_name text not null unique
);

This table really could have any column be the primary key. All three are perfectly acceptable candidate keys. Suppose we have a country record (232, 'United States', 'US'). Each of these fields uniquely identifies the record so if we know one we can know the others. Each one could be defined as the primary key.

I also recommend having a second, artificial candidate key which is just a machine identifier used for linking for joins. In the above example country.id does this. This can be useful for linking other records to the country table.

An exception to needing a candidate key might be where duplicate records really are possible. For example, suppose we are tracking invoices. We may have a case where someone is invoiced independently for two items with one showing on each of two line items. These could be identical. In this case you probably want to add an artificial primary key because it allows you to join things to that record later. You might not have a need to do so now but you may in the future!

Upvotes: 1

Walter Mitty

Reputation: 18940

A table with duplicate rows is not an adequate representation of a relation. It's a bag of rows, not a set of rows. If you let this happen, you'll eventually find that your counts will be off, your sums will be off, and your averages will be off. In short, you'll get confusing errors out of your data when you go to use it.

Declaring a primary key is a convenient way of preventing duplicate rows from getting into the database, even if one of the application programs makes a mistake. The index you obtain is a side effect.

Foreign key references to a single row in a table could be made by referencing any candidate key. However, it's much more convenient if you declare one of those candidate keys as a primary key, and then make all foreign key references refer to the primary key. It's just careful data management.

The one-to-one correspondence between entities in the real world and corresponding rows in the table for that entity is beyond the realm of the DBMS. It's up to your applications and even your data providers to maintain that correspondence by not inventing new rows for existing entities and not letting some new entities slip through the cracks.

Upvotes: 2

Branko Dimitrijevic

Reputation: 52107

In this particular case, there is no purpose in same pair of user IDs being stored more than once in the shortlist table. After all, that table models a set, and an element is either in the set or isn't. Having an element "twice" in the set makes no sense¹. To prevent that, create a composite key, consisting of these two user ID fields.

Whether this composite key will also be primary, or you'll have another key (that would act as surrogate primary key) is another matter, but either way you'll need this composite key.

Please note that under databases that support clustering (aka. index-organized tables), PK is often also a clustering key, which may have significant repercussions on performance.

¹ Unlike in mutiset.

Upvotes: 2

David

Reputation: 1419

If the rows are unique, you can have a two-column primary key, although maybe that's database dependent. Here's an example:

CREATE TABLE my_table
(
col_1 int NOT NULL,
col_2 varchar(255) NOT NULL,
CONSTRAINT pk_cols12 PRIMARY KEY (col_1,col_2)
)

If you already have the table, the example would be:

ALTER TABLE my_table
ADD CONSTRAINT pk_cols12 PRIMARY KEY (col_1,col_2)

Upvotes: 2

lebryant

Reputation: 351

Primary keys must identify each record uniquely and as it was mentioned before, primary keys can consist of multiple attributes (1 or more columns). First, I'd recommend making sure each record is really unique in your table. Secondly, as I understand you left the table without primary key and that's disallowed so yes, you will need to set the key for it.

Upvotes: 2

Why does my database table need a primary key?

Answers (6)

Related Questions