Reputation: 8795

Do Redshift column encodings affect query execution speed?

When creating data tables in Amazon Redshift, you can specify various encodings such as MOSTLY32 or BYTEDICT or LZO. Those are the compressions used when storing the columnar values on disk.

I am wondering if my choice of encoding is supposed to make a difference in query execution times. For example, if I make a column BYTEDICT would that make a difference over LZO when it comes to SELECTs, GROUP BYs or FILTERs?

Upvotes: 4

Answers (3)

Jonas D

Reputation: 298

Bit late but likely useful to anyone taking a look here:

Amazon can now decide on the best compression to use (Loading Tables with Automatic Compression), if you are using a COPY command to load your table, and there is no existing compression defined in your table.

You just have to add COMPUPDATE ON to your COPY command.

Upvotes: 0

Koen Michiels

Reputation: 93

Amazon actually has released a python script that can apply this automatically to your database. You can find this script here https://github.com/awslabs/amazon-redshift-utils/blob/master/src/ColumnEncodingUtility/analyze-schema-compression.py

Upvotes: 5

Rakesh Singh

Reputation: 170

Yes. The compression encoding used translates to amount of disk storage. Generally, the lower the storage the better would be query performance.

But, which encoding would be be more beneficial to you depends on your data type and its distribution. There is no gurantee that LZO will always be better than Bytedict or vice-a-versa. In my experience, I usually load some sample data in the intended table. Than do a analyze compression. Now whatever Redshift suggests, I go with it. That has worked for me.

Upvotes: 7

Do Redshift column encodings affect query execution speed?

Answers (3)

Related Questions