bbb0777
bbb0777

Reputation: 309

Surrogate vs Natural Primary Keys, *SPECIFICALLY* in a Data Warehouse. Is this debated?

Are Surrogate vs Natural Primary Keys generally debated in the world of data warehouses? To be clear - the natural keys would be there regardless. And by surrogate keys, I mean keys that don't exist in the source system, but are created as part of the ETL of the datawarehouse.

Is it debated whether to rely on the source systems natural keys as primary keys, or to assign surrogate keys as part of ETL?

My (limited) understanding has always been that in operational systems - it could go either way depending on the situation/person, but that in a data warehouse setting - surrogate keys were the non-debated norm for the primary keys.

Accurate, or is it more debated than that?

Upvotes: 2

Views: 860

Answers (1)

nvogel
nvogel

Reputation: 25526

Natural keys are virtually essential for almost any practical data warehouse solution (business key or domain key is really a much better term than natural key). The question is whether and when to use surrogate keys as well as, not instead of some other key. Managing surrogate keys can add a lot of complexity and some significant overhead so the best answer is "it depends...".

If your warehouse is based on a distributed write-once technology like HDFS, then surrogates would probably make no sense. If you are using some historical data capture mechanism like Microsoft's temporal tables or Oracle's flashback then you'll probably find no need for surrogates. If you are taking a temporal modelling approach based on 5NF or 6NF then you usually won't need surrogates either but you might want to use them for certain tables.

If you are following a template like Data Vault or Kimball's methods then maybe you'll want to use surrogates because that's what it says in somebody's book.

Upvotes: 2

Related Questions