Grijesh Chauhan
Grijesh Chauhan

Reputation: 58291

Keeping AUTO_INCREMENT field is inappropriate design?

In a database, I saw many tables in which the Primary-Key(PK) is AUTO_INCREMENT type.

Suppose I have a table Children created as follows:

CREATE TABLE Children(
  childNo INTEGER AUTO_INCREMENT NOT NULL PRIMARY KEY,
  name    VARCHAR(25),
  age     INTEGER,
  address VARCHAR(100) 
)

Keeping auto-increment field in a relation denotes that normalization is not appropriate?

In some table instead to introduce an extra AUTO_INCREMENT field I would like to keep all column as PK. Am I wrong?

Because my thoughts are against using AUTO_INCREMENT, please suggest me also usability to keep AUTO_INCREMENT field?

Upvotes: 3

Views: 212

Answers (2)

Bill Karwin
Bill Karwin

Reputation: 563021

The auto-increment PK column can be called a surrogate key.

Using surrogate keys can be a helpful optimization in some cases:

  • If no other set of columns in the table can reliably be treated a candidate key. For your example, there may not be able to say that the combination of (name, age, address) is guaranteed to uniquely identify rows in all cases. It may seem unlikely that there would be two people with the same name, same age, living at the same address. But it's still not invalid for that to happen. Using a surrogate key makes it possible for all the other columns to be non-unique in such cases.
  • It may be desirable for the PK to be unchanging. For example a person could change their name, but they are still the same person. SQL allows PK values to change, of course, but then all other data that references the PK by value has to change too. If your RDBMS supports foreign keys with ON UPDATE CASCADE, you can automate this. But what if you don't have ON UPDATE CASCADE (e.g. Oracle), or you don't have foreign keys (e.g. older MySQL or SQLite), or you have data stored outside the RDBMS? Using a surrogate key means any of the "natural" data columns are free to change values without changing the identify of the row. Surrogate key values are arbitrary and unrelated to the natural data, so the keys never need to change.
  • Even if there are columns you can use as a candidate key, it may be necessary to use a large subset of columns, as a compound primary key. The storage of the key becomes bulky, certainly a lot bulkier than a single integer. So there's an advantage with regards to storage efficiency to use a surrogate key.
  • Manipulating multi-column PK's also makes more coding work for developers, simply because they need to write longer conditions in JOIN and WHERE clauses. Also, if requirements change such that (name, age, address) is no longer a sufficient PK, you need to add a fourth column to the PK, now you have to change all the SQL code in all your applications.

So there are legitimate benefits for surrogate keys.

That said, surrogate keys are often over-used. Many application frameworks (e.g. Ruby on Rails) use a default that every table has an integer surrogate key named ID regardless of whether it's appropriate. You can specify a PK column on a table by table basis, but many programmers take the default as a rule, and this leads them to have some senseless table designs. The worst example I've seen is for every many-to-many table to have a superfluous ID column.

For what it's worth, using a surrogate key has nothing to do with normalization. That is, rules of normalization neither encourage nor discourage using surrogate keys.

Every database that supports surrogate keys also provides a function that returns the most recently generated id value in the current session. As @JStead mentioned, in SQL Server it's @@IDENTITY or SCOPE_IDENTITY(). In MySQL, it's LAST_INSERT_ID(). And so on.

These functions return only a single value, so you can't get all the generated id values if you insert multiple rows in a single INSERT statement. That's a limitation.

Upvotes: 3

JStead
JStead

Reputation: 1730

I would agree that is over used but it does have it's place in the RDBMS world. You didn't specify a database and this is more of a philosophical debate then any implementation so I will use sql server in my example below.

The best case for an auto_increment or identity is it much more efficient and easier to use than natural keys in a lot of case where the are wide or multi column natural keys.

Lets take the below table.

CREATE TABLE TABLE_OBJECT (
 Table_ID int identity(1,1),
 Server_NME varchar(128),
 Database_NME varchar(128),
 Table_NME varchar(128)
)

CREATE TABLE COLUMN_OBJECT (
 Column_ID int identity(1,1),
 Table_ID int not null,
 Server_NME varchar(128),
 Database_NME varchar(128),
 Table_NME varchar(128),
 Column_NME varchar(128)
)

Now lets say in this scenario we want to join the two table together and we don't have the identities.

select * from TABLE_OBJECT to
inner join COLUMN_OBJECT co on co.Server_NME = to.Server_NME
                          and co.Database_NME = to.Database_NME
                          and co.Table_NME = to.Table_NME

On top of that sucking to write it is also very inefficient I had to read 6*(128) bytes to compare one row.

Now compare that to the simplicity of the following.

select * from TABLE_OBJECT to
inner join COLUMN_OBJECT co on co.Table_id = to.Table_ID

In the above example I only had to read 2*(4) bytes to compare one row. That is a huge difference when you have a lot of rows.

Then there is the plain storage side too.

CREATE TABLE COLUMN_OBJECT (
     Server_NME varchar(128),
     Database_NME varchar(128),
     Table_NME varchar(128),
     Column_NME varchar(128)
    )

versus

CREATE TABLE COLUMN_OBJECT (
     Column_ID int identity(1,1),
     Table_ID int not null,
     Column_NME varchar(128)
    )

The storage on this is identity version 2*(4) + 128 bytes versus natural key version 4*128 bytes.

Uniqueness can also still be guaranteed with a unique constraint on the table_id and the column_nme. Then in the parent table a unique constraint on table_nme,database_nme,server_nme. To be honest though I would have created a database table as well likely and table would have just a unique constraint on database_id and table_nme but you get the idea. If the proper columns are selected for unique indexes uniqueness should never be an issue.

Getting the value of a previously inserted identity or auto_incremenet is trivial in most languages as well.

select @@IDENTITY
or
select LAST_INSERT_ID()

Every language has a way to get the last one.

Upvotes: 2

Related Questions