Should you zip files when saving blobs to SQL?

Question

I have JSON file that I want to save as a blob to Microsoft SQL Server.

The pros for zipping is saving space, the cons is the readability that getting lost.

I want to know if T-SQL has any optimization in which it zips the blobs on its own. I know that columnar databases work this way, like Vertica or Postgres for example.

sniperd · Accepted Answer

I personally would not compress them if I wanted to be able to search by them. I do not believe it compresses a blob on it's own. I know for a fact even just very large VARCHAR columns do not compress on their own, so I would not expect a blob to. However there is built in compression you can turn on:

https://blogs.msdn.microsoft.com/sqlserverstorageengine/2015/12/08/built-in-functions-for-compressiondecompression-in-sql-server-2016/

https://learn.microsoft.com/en-us/sql/relational-databases/data-compression/enable-compression-on-a-table-or-index?view=sql-server-2017

There are some advantages to it but usually at the cost of CPU. So if I were you, I'd probably not zip up the files to put in SQL, but I might compress the tables I store. It would depend on exactly what the data was, json probably gets a lot of space back on compression, but a .jpeg would not.

An option I have done in the past is to simply store my files on a content server somewhere, and store in SQL the meta data about the file (name, tags, patch to where I stored it, file extension, etc.) That way my data is easy to get at/put there and I simply use SQL to look it up. Additionally it has allowed me when it was large text files to also use Lucene indexes from solr to make a full text searchable solution since the data wasn't stuffed into a SQL table. Just an idea! :)

One more thought, if I were to store big json files into SQL I would probably choose VARCHAR(MAX) or NVARCHAR(MAX) as my datatype. Anytime I have tried to use TEXT, IMAGE, etc. I would later run into some kind of SQL error if I tried to do a tricky query. I believe Microsoft is trying to use VARCHAR(MAX) to replace the blob type of data types and is slowly deprecating them.

Should you zip files when saving blobs to SQL?

Answers (1)

Related Questions