Dragoljub Ćurčić
Dragoljub Ćurčić

Reputation: 349

What is the benefit of having varbinary field in a separate 1-1 table?

I need to store binary files in a varbinary(max) column on SQL Server 2005 like this:

FileInfo

FileContent

FileInfo has a one to one relationship with FileContent. The FileText is meant to be used when there is no file to upload, and only text will be entered manually for an item. I'm not sure what percentage of items will have a binary file.

Should I create the second table. Would there be any performance improvements with the two table design? Are there any logical benefits?

I've found this page, but not sure if it applies in my case.

Upvotes: 1

Views: 3449

Answers (3)

Remus Rusanu
Remus Rusanu

Reputation: 294307

There is no performance nor operational advantage. Since SQL 2005 the LOB types are already stored for you by the engine in a separate allocation unit, a separate b-tree. If you study the Table and Index Organization of SQL Server you'll see that every partition has up to 3 allocation units: data, LOB and row-overflow:

Table Organization
(source: s-msft.com)

A LOB field (varchar(max), nvarchar(max), varbinary(max), XML, CLR UDTs as well as the deprecated types text, ntext and image) will have in the data record itself, in the clustered index, only a very small footprint: a pointer into the LOB allocation unit, see Anatomy of a Record.

By storing a LOB explicitly in a separate table you gain absolutely nothing. You just add unneeded complexity as former atomic updates have to distribute themselves now into two separate tables, complicating the application and the application transaction structure.

If the LOB content is an entire file then perhaps you should consider upgrade to SQL 2008 and using FILESTREAM.

Upvotes: 6

eckes
eckes

Reputation: 10423

It can help to separate IMAGE, (N)TEXT, (N)VARCHAR(max) and VARBINARY(max) columns out of wider tables purely for some restrictions of SQL Server.

For example before 2012 it was not possible to online rebuild a clustered table if it contained LOBs. On the other hand you might not care about those restrictions, so setting up the table like your data is related is the better thing to do.

In case you physically want to keep the LOB data out of the table allocation unit you still can set the "large value types out of row" table option.

Upvotes: 0

mjv
mjv

Reputation: 75185

There is no real logical advantage to this two-tables design, since the relationship is 1-1, you might have all the info bundled in the FileInfo table. However, there are serious operational and performance advantages, in particular if your binary data is more than a few hundred bytes in size, on average.

EDIT: As pointed out by Remus Rusanu, on some DBMS implementations such as SQL2005, the large object types are transparently stored to a separate table, effectively alleviating the practical drawback of having big records. The introduction of this feature implicitly confirms the the [true] single table approach's weakness.

I merely scanned the SO posting referenced in this question. I generally thing that while that other posting makes a few valid points, such as intrinsic data integrity (since all CRUD actions on a given item are atomic), but on the whole, and unless of relatively atypical use cases (such as using the item table as a repository mostly queried for single items at a time), the performance advantage is with the two tables approach (whereby indexes on "header" table will be more efficient, queries that do not require the binary data will return much more quickly etc. etc.)

And the two tables approach has further benefits in case the design evolves to supply different types of binary objects in differnt context. For example, say these items are images (GIFs, JPGs etc.). At a later date you want to also provide a small preview version of these images (and/or a hi-resolution version), the choice of this being driven by the context (user preference, low band-width clients, subscriber vs. visitor etc.). In such a case not only are the operational issues associated with the single table approach made more acute, the model becomes more versatile.

Upvotes: 2

Related Questions