Keeping AUTO_INCREMENT field is inappropriate design?

Question

In a database, I saw many tables in which the Primary-Key(PK) is AUTO_INCREMENT type.

Suppose I have a table Children created as follows:

CREATE TABLE Children(
  childNo INTEGER AUTO_INCREMENT NOT NULL PRIMARY KEY,
  name    VARCHAR(25),
  age     INTEGER,
  address VARCHAR(100) 
)

ChildNo is AUTO_INCREMENT, but once I have inserted a row, how do I know which value was assigned for a child (for a name)? and bad choice for PK.
If I search on child's name it would be inefficient (and not guaranteed to be unique). For this reason I think keeping AUTO_INCREMENT as primary key denotes weak schema design?
Suppose I have another table Parents and there I need to keep ChidNo as Foreign Key (FK). Then it would be complex.
If there is a recursive association then keeping PK an AUTO_INCREMENT would be much bad.

Keeping auto-increment field in a relation denotes that normalization is not appropriate?

In some table instead to introduce an extra AUTO_INCREMENT field I would like to keep all column as PK. Am I wrong?

Because my thoughts are against using AUTO_INCREMENT, please suggest me also usability to keep AUTO_INCREMENT field?

Bill Karwin · Accepted Answer

The auto-increment PK column can be called a surrogate key.

Using surrogate keys can be a helpful optimization in some cases:

If no other set of columns in the table can reliably be treated a candidate key. For your example, there may not be able to say that the combination of (name, age, address) is guaranteed to uniquely identify rows in all cases. It may seem unlikely that there would be two people with the same name, same age, living at the same address. But it's still not invalid for that to happen. Using a surrogate key makes it possible for all the other columns to be non-unique in such cases.
It may be desirable for the PK to be unchanging. For example a person could change their name, but they are still the same person. SQL allows PK values to change, of course, but then all other data that references the PK by value has to change too. If your RDBMS supports foreign keys with ON UPDATE CASCADE, you can automate this. But what if you don't have ON UPDATE CASCADE (e.g. Oracle), or you don't have foreign keys (e.g. older MySQL or SQLite), or you have data stored outside the RDBMS? Using a surrogate key means any of the "natural" data columns are free to change values without changing the identify of the row. Surrogate key values are arbitrary and unrelated to the natural data, so the keys never need to change.
Even if there are columns you can use as a candidate key, it may be necessary to use a large subset of columns, as a compound primary key. The storage of the key becomes bulky, certainly a lot bulkier than a single integer. So there's an advantage with regards to storage efficiency to use a surrogate key.
Manipulating multi-column PK's also makes more coding work for developers, simply because they need to write longer conditions in JOIN and WHERE clauses. Also, if requirements change such that (name, age, address) is no longer a sufficient PK, you need to add a fourth column to the PK, now you have to change all the SQL code in all your applications.

So there are legitimate benefits for surrogate keys.

That said, surrogate keys are often over-used. Many application frameworks (e.g. Ruby on Rails) use a default that every table has an integer surrogate key named ID regardless of whether it's appropriate. You can specify a PK column on a table by table basis, but many programmers take the default as a rule, and this leads them to have some senseless table designs. The worst example I've seen is for every many-to-many table to have a superfluous ID column.

For what it's worth, using a surrogate key has nothing to do with normalization. That is, rules of normalization neither encourage nor discourage using surrogate keys.

Every database that supports surrogate keys also provides a function that returns the most recently generated id value in the current session. As @JStead mentioned, in SQL Server it's @@IDENTITY or SCOPE_IDENTITY(). In MySQL, it's LAST_INSERT_ID(). And so on.

These functions return only a single value, so you can't get all the generated id values if you insert multiple rows in a single INSERT statement. That's a limitation.

Keeping AUTO_INCREMENT field is inappropriate design?

Answers (2)

Related Questions