Reputation: 8795
When creating data tables in Amazon Redshift, you can specify various encodings such as MOSTLY32 or BYTEDICT or LZO. Those are the compressions used when storing the columnar values on disk.
I am wondering if my choice of encoding is supposed to make a difference in query execution times. For example, if I make a column BYTEDICT would that make a difference over LZO when it comes to SELECTs, GROUP BYs or FILTERs?
Upvotes: 4
Views: 3253
Reputation: 298
Bit late but likely useful to anyone taking a look here:
Amazon can now decide on the best compression to use (Loading Tables with Automatic Compression), if you are using a COPY command to load your table, and there is no existing compression defined in your table.
You just have to add COMPUPDATE ON
to your COPY command.
Upvotes: 0
Reputation: 93
Amazon actually has released a python script that can apply this automatically to your database. You can find this script here https://github.com/awslabs/amazon-redshift-utils/blob/master/src/ColumnEncodingUtility/analyze-schema-compression.py
Upvotes: 5
Reputation: 170
Yes. The compression encoding used translates to amount of disk storage. Generally, the lower the storage the better would be query performance.
But, which encoding would be be more beneficial to you depends on your data type and its distribution. There is no gurantee that LZO will always be better than Bytedict or vice-a-versa. In my experience, I usually load some sample data in the intended table. Than do a analyze compression. Now whatever Redshift suggests, I go with it. That has worked for me.
Upvotes: 7