cahoskins
cahoskins

Reputation: 125

Surrogate Key vs Natural Key for EF

My co-worker and I are trying to decide which is a better way to design the schema and keys for two database tables. One is a lookup-table that rarely changes. It has about 700 rows. The other table references the lookup-table. This table will have many thousand rows over time. In Design B, the lookup table has a primary key consisting of 3 varchars. The other table has a primary key consisting of the same 3 varchars with the addition of two date fields. In Design A, the 3 varchars are replaced with a surrogate key. The 3 varchars have a unique constraint (UC) on them.

Which is a better design? My co-worker says that if we have a surrogate key, doing joins on the tables will make this very slow when we need to display data to the users. Also, having a key that only is only for making the row unique is wasteful. My argument is that joins are fast and storing extra data for 3 varchars is wasteful because it duplicates this data in two tables.

We are using this in a WPF desktop application with EF 5, in T-SQL Server 2008. Surrogate Key or Natural Key? The attached image shows the two different designs.

enter image description here

Upvotes: 2

Views: 2000

Answers (1)

ypercubeᵀᴹ
ypercubeᵀᴹ

Reputation: 115660

With only a few thousand rows on the tables, I don't think you will notice any difference. And even if one the tables have millions of rows, the other will have as you say only 700. And SQL-Server is pretty much designed to do joins efficiently, so your co-worker is not correct when he claims that a join to a rather small (700 rows) table will affect efficiency.

One aspect that design A is better than B is that the bigger table (PriceIndex) will be narrower and so be the indexes used for the joining. 4 bytes instead of 90 can benefit performance a lot. And every other composite index you may need that includes the surrogate key will be narrower, too, in design A than in B.

A situation where design B will be more efficient than A is queries that involve GROUP BY columns from both tables. If for example you have a query with GROUP BY Price, HubCode, in design B you can add a composite index on these 2 columns while in design A the columns will be in separate tables and you can't have an index with columns from 2 tables.

Another aspect is whether there are other tables with these columns as primary keys, say if you have another table with (HubCode) as the PK and another with (HubCode, TimeFrame) and another with (IndexCode, HubCode) and maybe another with (IndexCode, HubCode, TimeFrame, StartDate, EndDate, CustomerID). With design B (all tables having natural keys), several complex queries involving joins from multiple tables can be more efficient as some intermediate joins can be eliminated. With design A (surrogate keys), intermediate joins cannot be skipped and the lookup costs can grow quite large when the (intermediate) tables are large.

In the end, nothing matters more than testing with your data and the sizes you expect your tables to grow and the type of queries you expect to run.

Upvotes: 4

Related Questions