Ilya Gazman
Ilya Gazman

Reputation: 32271

Should you zip files when saving blobs to SQL?

I have JSON file that I want to save as a blob to Microsoft SQL Server.

The pros for zipping is saving space, the cons is the readability that getting lost.

I want to know if T-SQL has any optimization in which it zips the blobs on its own. I know that columnar databases work this way, like Vertica or Postgres for example.

Upvotes: 3

Views: 601

Answers (1)

sniperd
sniperd

Reputation: 5274

I personally would not compress them if I wanted to be able to search by them. I do not believe it compresses a blob on it's own. I know for a fact even just very large VARCHAR columns do not compress on their own, so I would not expect a blob to. However there is built in compression you can turn on:

https://blogs.msdn.microsoft.com/sqlserverstorageengine/2015/12/08/built-in-functions-for-compressiondecompression-in-sql-server-2016/

https://learn.microsoft.com/en-us/sql/relational-databases/data-compression/enable-compression-on-a-table-or-index?view=sql-server-2017

There are some advantages to it but usually at the cost of CPU. So if I were you, I'd probably not zip up the files to put in SQL, but I might compress the tables I store. It would depend on exactly what the data was, json probably gets a lot of space back on compression, but a .jpeg would not.

An option I have done in the past is to simply store my files on a content server somewhere, and store in SQL the meta data about the file (name, tags, patch to where I stored it, file extension, etc.) That way my data is easy to get at/put there and I simply use SQL to look it up. Additionally it has allowed me when it was large text files to also use Lucene indexes from solr to make a full text searchable solution since the data wasn't stuffed into a SQL table. Just an idea! :)

One more thought, if I were to store big json files into SQL I would probably choose VARCHAR(MAX) or NVARCHAR(MAX) as my datatype. Anytime I have tried to use TEXT, IMAGE, etc. I would later run into some kind of SQL error if I tried to do a tricky query. I believe Microsoft is trying to use VARCHAR(MAX) to replace the blob type of data types and is slowly deprecating them.

Upvotes: 3

Related Questions