AdCan
AdCan

Reputation: 127

No primary key vs Composite primary Key

My scenario is to have a table for logging and reporting some important settings that users can change in a set of sites we maintain (each site has its own DB). When I was designing the table (going to be deployed on each DB), I was confused if I require a Primary Key or not. I have a column named SiteID and each row in the logging table will have SiteId, newVal of Setting, oldVal of setting, Change date. I will be using this to see the history of changes on a Site, filter changes by date etc. So in this scenario, obviously SiteId alone cannot be a PK, but do I really have to add a new column like LogId to make a composite PK ? Am I missing anything in here ?

Upvotes: 0

Views: 760

Answers (4)

elirevach
elirevach

Reputation: 404

You need to focus on your use case.

If you can avoid data duplication (e.g. from the application side) and you know your data, and know what type of index etc. you need, you can easily avoid that; mainly if you talking about massive scans for reporting and when data duplication is not issue.

In many MPP databases (databases that target reporting, e.g. Vertica) a primary key is optional. So if you know your data and use cases, you can avoid primary keys.

Upvotes: 1

TommCatt
TommCatt

Reputation: 5636

All entity tables should have a PK, not all tables. History/log tables typically do not have a PK because they represent events, not entities. (Rather, they don't need a PK. I've seen them defined with one (generally auto generated) but I have yet to see the value used in a query. As far as I've been able to tell, they are completely superfluous.)

What history and log tables do have is a timestamp field and a field representing what event took place. The rest of the record will have some kind of entity information, including the entity PK, and other data concerning the event -- like before and after values such as the OP specified.

The timestamp column will be indexed, along with the entity ID (so one may examine all events that took place on a specified entity over a time period) and/or the event ID (so one may examine all the entities on which a specified event took place over a time period). This all depends on the type of information needed from the table.

Upvotes: 1

philipxy
philipxy

Reputation: 15118

A table without duplicate rows always has at least one candidate key. We can pick one to declare via PRIMARY KEY; any others are via UNIQUE NOT NULL. (UNIQUE NOT NULL is the constraint that you get from a PRIMARY KEY declaration.) If you don't tell the DBMS about your candidate keys then it can't help you by, say, reducing errors by preventing duplicates or improving performance by implicitly defining an index on its columns.

It's not clear from your question how you record what setting given new & old values are for. If your columns are siteId, setting, newValue, oldValue & changeDate then your table has one candidate key, (siteID, setting, changeDate). As the only candidate key, it is the primary key. If setting values identify the settings so that your columns are siteId, newValue, oldValue & changeDate then you have two candidate keys, (siteID, newValue, changeDate) and (siteID, oldValue, changeDate). Each should get a UNIQUE NOT NULL constraint. One could be via a PRIMARY KEY declaration.

If you add another column of unique/surrogate/id values then the table has another candidate key. As before constrain all of them via a UNIQUE NOT NULL constraint, one of which can be expressed via PRIMARY KEY. There are reasons for adding such a unique/surrogate/id column, but it's not because there's no primary key unless the table would otherwise have duplicates rows.

Upvotes: 1

Kason
Kason

Reputation: 797

I would make the schema like this

{SiteId,version,Setting,Change date}

As, old setting you can find in previous version so keep the value on each version is fine.

Of course, you can also use change date to be a PK like below:

{SiteId,Change date,Setting,}

If the date is include time and your scenario is allow this, using date and siteID is better.

Upvotes: 0

Related Questions