Relation between DISTSTYLE and Compression encoding in Redshift

Question

Is there any relation between DISTSTYLE and Compression encoding in Redshift. As whenever we use Compression encoding the operating system on compute node do extra work of encoding and decoding the data; with DISTSTYLE set as ALL don't you thing every node had to do the decoding and encoding work ?

Any conceptual help here is highly appreciated.

John Rotenstein · Accepted Answer

The Distribution Style determines which node/slice will store the data. This has no relationship or impact on compression type. It is simply saying where to store the data.

Compression, however, is closely related to the Sort Key, which determines the order in which data is stored. Some compression methods use 'offsets' from previous values, or even storing the number of repeated values, which can significantly compress data (eg "repeat this value 1000 times" rather than storing 1000 values).

Compression within Amazon Redshift has two benefits:

Less storage space (thus, less cost)
More data can be retrieved for each disk access

The slowest operation of any database is disk access. Therefore, any reduction in disk access will speed operations. The time taken to decompress data is minor compared to the time required for an additional disk read operation.

The second most 'expensive' operation is sending data between nodes. While network traffic is faster than disk access, it is best avoided.

When using DISTSTYLE ALL, it simply means that the data is available on every node, which avoids the need to transfer data across the network.

Relation between DISTSTYLE and Compression encoding in Redshift

Answers (1)

Related Questions