virtualadrian
virtualadrian

Reputation: 4738

CHAR(64) or BINARY(32) To Store SHA256 Hash in SQL SERVER

I am debating which datatype to use when storing a SHA256 hash in SQL Server. Should it be CHAR(64) or BINARY(32) ... The column will be part of a unique clustered index. I know that I'm probably splitting hairs at this point, however I want to do this right the first time and I know that at times primitive data types are faster and other times the newer more exotic types perform better. ( yes I know char(64) isn't all that new, but it's newer than byte storage )

I've looked around and can't find anything about the performance of one vs. the other in terms of search, etc.

Upvotes: 10

Views: 16571

Answers (3)

Pritesh singh
Pritesh singh

Reputation: 11

BINARY(32) is generally recommended for storing SHA256 hashes in SQL Server.

Reason:

Storage Efficiency: A SHA256 hash is always 32 bytes (256 bits). BINARY(32) directly maps to this size, whereas CHAR(64) allocates space for 64 characters. This can lead to slightly smaller storage requirements and potentially better performance for operations on the column when using BINARY(32).

Clustered Index Consideration: Since your column will be part of a clustered index, using the most compact data type is beneficial. BINARY(32) takes up less space compared to CHAR(64), leading to a smaller clustered index size and potentially faster index seeks.

While CHAR(64) might seem appropriate because a SHA256 hash is represented as a hexadecimal string (64 characters), BINARY(32) is the more efficient way for SQL Server to store the raw binary hash value.

Ultimately, the difference in performance between the two data types might be negligible for most use cases. But if you're dealing with a large number of hashes or are concerned about storage optimization, BINARY(32) is the better choice.

Upvotes: 0

Finisl
Finisl

Reputation: 26

The choice of data type might depends on how you're going to use (update/consume) the data.

For our company's data warehouse, we choose to store the hashed value as BINARY(32), because this value is used as primary key or foreign key, and it never needs to be consumed as CHAR or VARCHAR. The benefit is less storage needed, so the data size of related index is smaller as well. Overall, it just leads to a better performance.

I guess CHAR(64) will be preferrable, if you need to interact with applications that will not handle binary values, such as logging application or command-line interface (CLI) applications.

DECLARE @binTest BINARY(32)
DECLARE @varTest1 VARCHAR(66)
DECLARE @varTest2 VARCHAR(64)
SELECT @binTest = HASHBYTES('SHA2_256', 'test')
SET @varTest1 = CONVERT(VARCHAR(66), @binTest, 1)
SET @varTest2 = CONVERT(VARCHAR(66), @binTest, 2)
PRINT @binTest
PRINT @varTest1
PRINT @varTest2

Upvotes: 0

Diego
Diego

Reputation: 36166

you do know that by using CHAR(64), each row will occupy 64 bits even if your key is "A"? I am not going to discuss the fact that you are using a string as a clustered index, I just assume you have a good reason for that, but using CHAR instead of VARCHAR? Are you planing to update the value? Because that would be the only reason I see to use char instead of varchar

Upvotes: -5

Related Questions