aviad
aviad

Reputation: 8278

Why surrogate keys are needed?

I am reading about DW modeling and started wondering why surrogate keys are used at all?

I understand that sometimes business keys are not integers nthat makes the life (as well as joiing and indexing) harder. However, what I do not understand is why to solve kinda limitation of the DW or RDBMS by adding and extra column for managing unique identifiers?

Would it not be more appropriate that this kind of functionality would be transparent to DW/RDBMS users and the entry will get internal identifier from the system automatically? For example creating an SHA-1 digest of the entire row or a subset of it (those fields that can be represented in some kind of a textual format).

Upvotes: 0

Views: 1231

Answers (1)

Cory
Cory

Reputation: 12824

The reason to use surrogate keys is because you have control over the data warehouse but most likely do not have control over the source systems. Assumptions you make today about the stability of the natural keys can cause you problems in the future.

Issues you may run into by not using your own surrogate key:

  1. Large or complex natural key in source - As you already mentioned, the source system could be using a natural key that will not perform as well as a simple integer
  2. Natural key may be reused in source - I ran into an issue once where the source system would recycle keys starting from 1 again once the maximum value an integer can hold was reached (for the application this made sense). The data warehouse had to recognize that the repeated keys were brand new records.
  3. Mergers - Imagine two companies merging together. Each company has an Employee table with an auto incrementing integer used as the key. Each company will have an Employee #1. The DW warehouse will need a surrogate key to distinguish the two people who share the same ID.

Upvotes: 2

Related Questions