Reputation: 44568
It's fairly general question, but I'd like to know what do you use in determination of primary key of the table. Examples supplied with you reasoning are highly desired.
I noticed that many programmers add ID column and use it as a primary key. I think, it is flawed from design point of view, as ID in that case HAS NOTHING TO DO WITH THE TABLE.
Upvotes: 7
Views: 3239
Reputation: 18940
Use natural keys wherever they work and can be trusted. If you analyze your subject matter into entities and relationships among entities (ER), you should come up with keys that identify the entities in the data itself. If there is an entity whose identity is confused in the data itself, invent an artificial key (usually called a surrogate key). Inventing a key is a last resort.
When you go to build tables, some tables describe entities, and others describe relationships. Entity tables get the same key as the entity. Relationship tables get a compound key with one component for each entity that participates in the relationship. Some relationships won't get a table of their own (many to one). Instead, they will be represented by adding foreign keys to existing tables, so they won't need a primary key of their own.
This will slow you down a little compared to using invented ID fields for every table. But it will result in better data management, which results in better data.
Upvotes: 1
Reputation: 2587
Every Data that is needed to uniquely identify your Entry should be your table ID. If none such Data exists, you have to create one (most commonly, a running number is used).
I disagree on your point that all ID should have something to do with the table because sometimes it isn't enough to uniquely identify the record. Furthermore, you'd have to use multiple ID's which is much harder to work with than a simple running number as ID.
Primary Keys are relatively simple for single tables, but once you have your Entries spread out on multiple tables, things can get messy, especially with many to many connections. Working with foreign Keys also has to be thought through before implementing them.
If you want to work professionally with Databases (or want to do it by the book), it would be best to familiarize yourself with Data Structure Diagrams
EDIT: Unified Modeling Language should help you determine what to use as Primary Key
Upvotes: 1
Reputation: 1258
My thought process in determining a primary key goes like this.
"One record in this table will represent ...?"
"For distinct value of Col X, Col Y, Col Z.. there should only be one row in the table", What are Cols X Y and Z ?"
The CAR_MODEL table.
Hmm this table will store information about different types of cars, should the MANUFACTURER_NAME be the key ? No, I can have many rows identifying different car models from the same manufacturer. Hmm should the MANUFACTURER_NAME and MODEL_NAME be the key ? No, i want to have different rows with the same MANUFACTURER_NAME and MODEL_NAME but different release years in the table at the same time. Ok what about "MANUFACTURER_NAME", "MODEL_NAME" and "RELEASE_YEAR".
Is it possible for me to have two rows with the same MANUFACTURER_NAME, MODEL_NAME and RELEASE_YEAR at the same time? Hmmm no. That wouldn't make sense, they would be the same Car Model, and I only want 1 record per car model. Great, that's the key.
One record in this table will represent a particular model from a particular year from a particular manufacturer. I decide this when i create the table, that's why i created the table, if you can't describe what's going in the table in terms that help identify the key you don't really understand why you are creating it.
Horrible Changes Over Time!!! (surrogate keys, Natural Key, Slowly changing dimensions)
Ah but the information I am storing about a particular Car Model (from a particular Manufacturer and Release Year) may change. Initially I was told that it had two doors, now I find it has four, I want to have this correct information in my table but not lose the old record as people have reported off it and I need to be able to reproduce their old results.
Ok, I will add a new column "MODEL_ID" and make it the primary key of the table, so I can store multiple records with the same model name, manufacturer name and release year. I will also add a valid_from and valid_to timestamp.
This can work well, and indeed with my changes the Primary Key of the table is now MODEL_ID, a surrogate key. But the Natural Key, the Business Key, the key 'at any point in time', is still Model_Name, Manufacturer_Name and Release_Year, and I can't loose sight of that.
Note on Surrogate Keys :
A surrogate key is unique for each row, by definition! A surrogate key makes it easier to manipulate data sometimes, especially data that changes over time. But a surrogate key doesn't in any way replace a Natural Primary Key, you still need to know what the 'grain' of the table is.
If we said that every person in Australia will be assigned a Stack_Overflow_User_id what would we do when Jeff and Joel started giving Stack_Overflow_User_Id's to dogs and cats and multiple IDs to the same people ??
We would say, "hey Jeff and Joel, only give out 1 ID per First_Name, Last_Name, Date_of_Birth and Place_of_Birth!". *
We need to know the natural key or we can give anything a surrogate key!
(* what about people where all these are the same ? don't we need a passport number or some sort of surrogate ? In practice a surrogate is nice and clean, but where did it originate ? originally it came from a natural key.)
Upvotes: 4
Reputation:
Think of it as a possible unique identifier (single or multiple columns) for your records.
Think about finger prints. Do you think they are unique to an individual; it hasn't been proven yet, but it sure seems like a decent unique identifier until the population gets so large that redundancy creeps in. At present, this is like a primary key for records that identify you. [1 column]
In the case that our population explodes, and finger prints start showing their weaknesses, we can combine finger prints and iris scans to be a much stronger primary key. [2 columns]
The primary key is usually unique by design such as an ID number provided at the instantiation of the record in our database.
At the least, I hope this helps with the concept.
Upvotes: 0
Reputation: 1938
When I use surrogate keys it seems to me that performance increases. I usually use Int ID for performance.
Upvotes: 1
Reputation: 12157
The primary key should always be an auto-increment integer that's unrelated to your data.
Edited to add that GUIDs are fine too. The important thing is that the key doesn't describe your data, so if your data changes, your PK doesn't. Always use an ID field.
Consider you're using an email as your primary key, and then the user changes their email address. You then have to cascade that change to every joined table. Using real data as your PK doesn't make sense.
Upvotes: 0
Reputation: 527
In theory any unique field could be used (e.g. Social Security Number, url, etc.), but in practice I don't think there's any big disadvantage to using an auto-generated ID. For example, some wacky mistake makes a duplicate SSN could be disastrous to your data.
Upvotes: 0
Reputation:
The role of a primary key is to uniquely identify each row in your table. If no column or set of columns matches this requirement, an column containing a unique id is often added as a primary key.
I do not agree with your comment about programmers adding an id that has nothing to do with table data. When you need to link data across several tables, a concise id is easier to use than a compound key.
Upvotes: 12
Reputation: 35
Well, in one of the systems we use (and I designed), each user has an auto-incremented primary key as their ID. Other tables that are related to that particular user uses their ID as its primary key (although obviously not auto-incremented) as well, so it does make sense if used correctly.
Upvotes: 0
Reputation: 46366
A primary key doesn't necessarily have to be a single column, but can also be a combination of columns. As Altherac's answer states it's purpose is to uniquely identify each row.
For performance reasons it's better to have a small key, but depending on the needs of the system the type of key used can vary greatly.
Upvotes: 1
Reputation: 3663
You did of course Google about this first, right? I see that the first results that pop up for me with the proper definition of a primary key also contain examples.
Upvotes: 2
Reputation: 2398
You pick anything that you know will be a unique value, preferably something numeric such as a customer ID or account number. Stay away from string based keys if at all possible. If nothing else, use a GUID value or an Auto-increment integer.
Upvotes: 3
Reputation: 44046
A key should be a column where each entry is guaranteed to be unique. Examples might be things like a social insurance number or driver's license number. In theory you can tie multiple columns together into a compound key. So perhaps name and birth day might be unique together so they could be a key. However in practice nobody does that because crossing tables is a pain. The best solution is usually to add an autoincrementing value or GUID column.
Upvotes: 2
Reputation: 118118
Hollywood Couples: Relationships are harder than you think ... one of my favorites.
Upvotes: 3