Hexadecimal key to clustered GUID vs BIGINT identity PK's (Best Performance) - SQL Server 2008

Question

I'm currently intergrating API Key verification, I generated the keys using a few different datapoints (email+time etc.) to a small unique (so far) hexidecimal string i.e. "7edbf060" (this cannot be changed as many keys have already gone out) I'm looking to convert/store these keys into a GUID or BIGINT for faster indexing/searching on the table, as this will be accessed about 300-500x per second (currently 40 million+ requests a day) so performance is a BIG factor here!

Obviously leaving the hexadecimal key as a VARCHAR column and performing a search on that would be highly inefficient! So my thought is once the users send their key I can convert it to something else (GUID/BIGINT) and perform my lookup that way.

SELECT CONVERT(uniqueidentifier, 0x7edbf060)
--output: 60F0DB7E-0000-0000-0000-000000000000

SELECT CONVERT(bigint, 0x7edbf060)
--output: 2128343136

As the keys are generated non-sequentially the index has to be clustered and will get fragmented, but I have scripts that reorignize these tables throughout the day to keep that under control.

So I was thinking of a table like this: [GUID] (uniqueidentifier, PK, not null), [Key] (varchar(8)), [Email] (varchar(max))

Are these my only/best options? My concerns are the hexadecimal key once converted to a uniqueindentifier in SQL seems to be only using the first 8 characters followed by "-0000-0000-0000-000000000000", and once converted to bigint the value is already in the 2 billion range and don't know how either option will perform once there is a few thousand records with ranges all over the place.

Any help/advice would be greatly appreciated! Thanks.

Dan Guzman · Accepted Answer

I suggest you store it natively as a binary(4) data type since that is the actual type of the source data. A binary column can be primary key column or indexed as appropriate. The proper data type is the one most appropriate for the domain of data stored. Data type performance is seldom a consideration for even the most demanding of applications.

From a performance perspective, I expect you can achieve many thousands of select requests per second (assuming a single row is returned) even on modest hardware as long as the table is small enough to remain in buffer cache. Fragmentation is also not a consideration for singleton requests as long as data are in memory. Fragmentation will affect performance of random requests only when it gets to the point of impacting buffer efficiency.

For a table too large to stay in buffer cache, the random key will require physical I/O proportional to non-cached difference. You'll need a storage subsystem capable of meeting the resultant I/O demand of the random key with a large table.

You mention searches. If these require table scans rather than point lookups, it may be problematic to achieve your performance goal depending on the table size. Specify the insert rate and expected table size.

Hexadecimal key to clustered GUID vs BIGINT identity PK's (Best Performance) - SQL Server 2008

Answers (1)

Related Questions

Hexadecimal key to clustered GUID vs BIGINT identity PK&#39;s (Best Performance) - SQL Server 2008

Answers (1)

Related Questions

Hexadecimal key to clustered GUID vs BIGINT identity PK's (Best Performance) - SQL Server 2008