John
John

Reputation: 503

Data Mart design - Best practice - Why are foreign keys not used?

I am working on project, where a smaller datamart (maybe 30 tables) was implemented from the scratch. Now, the colleague with deep knowledge of this mart is going to do another project, left me alone in this project (with some support from him).

I just thought that I generate database diagram and thus I will not need any extra help when I would modify an ETL and do some joins etc. But, surprisingly for me, newbie in this profession...

I generated diagram, and no star or snowflake schema was there, just plain tables without primary and foreign keys. So My work was to try to imagine how the tables are related, then consult it if it is true, then remake my script etc. ANNOYING.

When I ask why it is like this (without relations between tables), I got this answer: "It is because of performance."

Is it usual to solve it like this? If no, how to solve it with relations and still good performance?

Upvotes: 0

Views: 4151

Answers (2)

Justin
Justin

Reputation: 86789

Foreign keys are constraints used to ensure consistency of data in a database - their purpose is not to document the structure of your database, rather it is to enforce data consistency rules by controlling what changes are allowed to the database.

This is all good in a live database where data integrity is key, but in a datamart there is no need to enforce these rules - we know the data is consistent because it's a copy / extract of the live database where these rules are enforced.

Foreign keys also come with some disadvantages:

  • They complicate the datamart extract process (you need to ensure that data is extracted in a certain order)
  • They prevent partial exports (where you export only certain tables from your database)
  • They also incur a runtime performance penalty when making changes to the database as the database server has to check / validate each constraint as changes are made

In short, they reduce performance and provide no real benefit, so why bother? Just make sure that your datamart is documented properly elsewhere.

You might be interested in these questions:

Upvotes: 5

user359040
user359040

Reputation:

Justin has provided a good answer to why a DWH may be designed like this.

You may still be able to derive the relationships between tables by checking the indexes that have been set up on those tables - a unique index will normally indicate a primary key on that table, while foriegn keys will normally require non-unique (ie. repeating) indexes.

Also, if the DWH is a Kimball star-schema style database, it should be apparent which tables are dimensions and which tables are fact tables - the former will normally hold meaningful descriptions with a single key field and no numerical measures, while the latter will normally hold multiple key fields (one for each dimension) and numerical measures with few long character fields (normally none).

In a true star schema, dimension tables don't directly link to each other, only to the fact tables (however, you may have a snowflake schema). You should normally be able to tell which dimensions link to which keys on the fact tables, based on the names of the key fields on the dimension tables.

Most importantly of all, however, is this: Documentation is not an optional step when building a datawarehouse. Ask the developer where the documentation is; if it doesn't exist then the developer should be responsible for providing it.

Upvotes: 0

Related Questions