Reputation: 417
I'm trying to add dist and sort keys to some of the tables in redshift.
I notice that before adding the size of the table is 0.50 and after adding it gets increased to 0.51 or 0.52. Is this possible ? The whole purpose of having dist and sort keys is to decrease the size of the table and help in increasing the read/write performance.
Upvotes: 0
Views: 1219
Reputation: 269390
That is not the purpose of having a DISTKEY
and SORTKEY
.
To decrease the storage size of a table, use compression.
The DISTKEY is used to distribute data amongst slices. By co-locating information on the same slice, queries can run faster. For example, if you had these tables:
customer
table, DISTKEY = customer_id
invoices
table, DISTKEY = customer_id
...then these tables would be distributed in the same manner. All records in both tables for a given customer_id
would be located on the same slice, thereby avoiding the need to transfer data between slices. The DISTKEY should be the column that is mostly used for JOINS.
The SORTKEY is used to sort data on disk, for the benefit of Zone Maps. Each storage block on disk is 1MB in size and contains data for only one column in one table. The data for this column is sorted, then stored in multiple blocks. The Zone Map associated with each block identifies the minimum and maximum values stored within that block. Then, when a query is run with a WHERE statement, Amazon Redshift only needs to read the blocks that contain the desired range of data. By skipping over blocks that do not contain data within the WHERE clause, Redshift can run queries much faster.
The above can all work together. For example, compressed data requires fewer blocks, which also allows Redshift to skip over more data based on the Zone Maps. To get the best possible performance out of queries, use DISTKEY, SORTKEY and compression together.
(It is often recommended not to compress the SORTKEY column because it causes too many rows to be loaded from a single block.)
See also: Top 10 Performance Tuning Techniques for Amazon Redshift
Upvotes: 8