Reputation: 942

SQL Server varbinary trailing 0 interpretation

CREATE TABLE [sql_table1] ([c0] varbinary(25) NOT NULL primary key) 
go

insert into sql_table1 values (0x3200),(0x32);
go

I get

Cannot insert duplicate key in object 'dbo.sql_table'. The duplicate key value is (0x32).

Why? 0x32 does not equal 0x3200

Upvotes: 4

Answers (1)

Dan Field

Reputation: 21641

It gets right padded. BINARY data gets tricky when you try to specify what should normally be equivalent numerically hex values. If you try this it will work:

insert into #sql_table1 values (0x32),(CAST(50 as VARBINARY(25)));
-- inserts 0x00000032
-- and     0x32

But these are numerically equivalent. Generally speaking, it's a bad idea to have a BINARY column of any sort be a primary key or try to put a unique index on it (moreso than a CHAR/VARCHAR/NVARCHAR column) - any application that inserts into it is going to almost certainly be CASTing from some native format/representation to binary, but there's no guarantee that that CAST actually works in a unique manner in either direction - did the application insert the value 50 (= 0x32), or did it try to insert the literal 0x32, or did it try to insert the ASCII value of 2 (= 0x32), or did it insert the first byte(s) of something else....? If one app submits 0x32 and another 0x0032 are they the same or different (SQL Server says different - see below)?

The bottom line is that SQL Server is going to behave unusually if you try to compare binary columns flat out without some context. What will work is comparing binary data using BINARY_CHECKSUM

SELECT BINARY_CHECKSUM(0x32)  -- 50
SELECT BINARY_CHECKSUM(0x320) -- 16! -- it's treating this as having a different number or ordering of bytes
SELECT BINARY_CHECKSUM(0x3200) -- 50
SELECT BINARY_CHECKSUM(0x32000) -- 16
SELECT BINARY_CHECKSUM(0x0032) -- 50
SELECT BINARY_CHECKSUM(0x00032) -- 50
SELECT BINARY_CHECKSUM(0x000000000032)  -- 50

but again, this only helps you see that the hexidecimal represenation of the binary data isn't going to work exactly the way it would seem. The point is, your primary key is going to be based on the BINARY_CHECKSUMs of the data instead of any particular format/representation of the data. Normally that's a good thing, but with binary data (and padding) it becomes a lot trickier. Even then, in my example above the BINARY_CHECKSUM of both columns will be exactly the same (SELECT BINARY_CHECKSUM(c0) FROM sql_table1 will output 50 for both rows). Weird - a little further testing is showing that any different number of leading 0s that fit into the column length will bypass the unique check even though the checksum is the same (e.g. VALUES (0x32), (0x032), (0x0032) etc.).

This only gets worse if you start throwing different versions of SQL Server into the mix (per MSDN documentation).

What you should do for PK/Unique design on a table is figure out what context will make sense of this data - an order number, a file reference, a timestamp, a device ID, some other business or logical identifier, etc.... If nothing else, pseudokey it with an IDENTITY column.

Upvotes: 2

SQL Server varbinary trailing 0 interpretation

Answers (1)

Related Questions