Reputation: 15698
I am well aware that if I use a nvarchar
field as a primary key, or as a foreign key, that this will add some time and space overhead to the usage of the generated index in the majority (if not all) of cases.
As a general rule, using numeric keys are a good idea but under certain common circumstances (small sets of data for instance) it isn't a problem to use text based keys.
However, I am wondering if anyone could provide rigorous information on whether is it MORE efficient, or at least equal, to use text for database keys rather than numeric values under certain circumstances.
Consider a case where a table contains a short list of records. For our example, we'll say we need 50 records. Each record needs an ID. You could use, generic int
(or even smallint
) numbers (e.g. [1...50]
) OR you could assign meaningful, 2 character values to a char(2)
field (e.g. [AL, AK, AZ, AR, ... WI]
).
In the above case, we could assume that using a char(2)
field is potentially more efficient than using an int
key since the char data is 2-bytes, vs. 4-bytes used with a int. Using a smallint
field theoretically be just as efficient as the char(2)
field and, possibly, a varchar(2)
field.
The benefit from using the text based key over the numeric key is that the values are readable, which should make it obvious to many that my list of 50 records is likely a list of US States,
As stated, using keys that are smaller or equal in size of a comparable numeric key should be of similar efficiency. However, depending on the architecture and design of the database engine it is possible that in-practice usage may yield unexpected results.
With that stated, is it ever more, equal or less efficient to use any form of text-based value as a key within SQL Server?
I don't need obsessively thorough research results (though I wouldn't mind it), but I am looking for an answer that goes beyond stating what we would expect from a database.
Definitively, how does efficiency of text-based keys compare to numeric-based keys as the size of the text key increases/decreases?
Upvotes: 3
Views: 1338
Reputation: 2939
Think in assembly to find out the answer. You stated this:
we could assume that using a char(2) field is potentially more efficient than using an int key since the char data is 2-bytes, vs. 4-bytes used with a int. Using a smallint field theoretically be just as efficient as the char(2) field and, possibly, a varchar(2) field.
This isn't true, as you can't move 2 characters simultaneously in a single instruction (to my knowledge). So even as a char
is smaller than a 4-byte int
, you have to move them one-by-one into the register to do a comparison. To compare two instances of a 4-byte int
, even if it is larger in size, you only need 1 move instruction per int
(disregarding that you also need to move them out of the register back into the memory).
So what happens if you use an int
:
In the case of a char
, however:
int
s.Point is that on low level, the determining factor is not the data size in this case but the number of instructions needed.
Apart from the low-level stuff:
int
instead, then your record will have nothing to do with what happens in reality, in which case you only have to update the abbreviation and the state name itself and you can sit back assured that everything is consistent.Upvotes: 3
Reputation: 3996
In most cases considerations driven by the business requirements (use cases) will far outweigh any performance differences between numeric v. text keys. Unless you are looking at very large and/or very high throughput systems your choice of primary key type should be based on how the keys will be used rather than any small difference in performance you will see between numeric and text keys.
Upvotes: 3