Sanket_patil
Sanket_patil

Reputation: 300

Redshift cluster backup disk space

The official documentation says that Redshift

maintains at least three copies of your data (the original and replica on the compute nodes and a backup in Amazon S3)

So if both original and replica exist on the same cluster,
Does that mean I will have only half the size of my cluster to actual use since other half is taken by the replica? Also how can I see or confirm this backup data in cluster ?

Upvotes: 1

Views: 164

Answers (2)

John Rotenstein
John Rotenstein

Reputation: 269410

Each Amazon Redshift compute node actually has twice the amount of storage as publicly stated. The extra is used to backup other nodes.

You can see this in a query like this:

SELECT
  owner AS node,
  diskno,
  used,
  capacity,
  used/capacity::numeric * 100 as percent_used 
FROM stv_partitions 
ORDER BY 1, 2;

The primary storage is when host = node. For other values, it indicates storage being used as a backup.

Upvotes: 2

ketan vijayvargiya
ketan vijayvargiya

Reputation: 5659

I think you misunderstood the documentation.

Amazon Redshift replicates all your data within your data warehouse cluster when it is loaded and also continuously backs up your data to S3. Amazon Redshift always attempts to maintain at least three copies of your data (the original and replica on the compute nodes and a backup in Amazon S3).

This actually talks about two types of backups:

  1. the original and replica on the compute nodes: This talks about Redshift's internal backup mechanism. Each cluster of size greater than 1 nodes is made up of two types of nodes: leader and compute. This part says that Redshift internally backs up your data across compute nodes, so if one compute node goes down, Redshift doesn't lose your data. In other words, this data replication ensures durability.

    Sure, the extra backup takes space in your cluster, but I don't think Redshift allows modifying this setting or accessing the backup data as such. It's all transparent to you.

  2. backup in Amazon S3: These backups are accessible to you and you can use an existing one to restore data to create a new cluster.

More information on both can be found here.

Upvotes: 1

Related Questions