Tk_fan
Tk_fan

Reputation: 1

Amazon Redshift size increase

I have inserted data from one table to another table, after inserting the data the size of the table is increased to 30mb more with the actual size and the stats_off is also increased to 100.Can any one give me a reason why the size of the table is increased and how to reduce the stats_off of the table?

Upvotes: 0

Views: 196

Answers (1)

Bill Weiner
Bill Weiner

Reputation: 11032

So the table is growing by 5% when you copy the contents from one table to the other. I'll assume that the compression (encodings) are the same between table as this can increase the table size. You say that you insert data using "insert commandS" (plural) and this can affect the size. The unit of storage in Redshift is the 1MB "block" - this is the smallest unit that can be saved. The block is also the unit for coherency. If I save 1 row of data each column's size will be 1MB. There is always some empty space at the end of each column.

Now if you inserted half of your data, committed, and then inserted the other half there will be partial blocks for the first insert AND partial blocks for the second insert. This will lead to some increase in table size on disk. These partial blocks will be for every column and for every slice. You could easily see an increase of 32MB just from this effect.

To reclaim the space lost to these gaps you need to run VACUUM on the table. This will sort and compact the data in the table.

The above assumes that the DDL for the table didn't change. There are a number of ways in which changing the DDL will affect the table size. The compression modes are a clear example but even the distribution key and style can affect table size. It all comes down to how the data is allocated into blocks and stored on disk.

As mentioned before ANALYZE will create the stats for the table.

Upvotes: 1

Related Questions