Reputation: 4812

How to make AWS S3 Glacier files available for retrieval recursively with AWS CLI

How can I make files stored at AWS S3 Glacier available for retrieval recursively from CLI?

I run the following command:

aws s3 cp "s3://mybucket/remotepath/" localpath --recursive

and got the following line for each of the files:

warning: Skipping file s3://mybucket/remotepath/subdir/filename.xml. Object is of storage class GLACIER. Unable to perform download operations on GLACIER objects. You must restore the object to be able to perform the operation. See aws s3 download help for additional parameter options to ignore or force these transfers.

However, the aws s3api restore-object has a --key parameter which specifies a single file without an ability to recursively traverse through directories.

How can I recursively restore files for retrieval from AWS CLI?

Upvotes: 0

Answers (2)

Schwarz Software

Reputation: 1552

A quick way to do this is to copy/paste your error messages into a text editor and use find/replace to convert the errors into restore-object calls. Use find/replace like so:

Find: warning: Skipping file s3://bucketname/
Replace: aws s3api restore-object --bucket <bucketname> --restore-request Days=25,GlacierJobParameters={"Tier"="Bulk"} --key

Then do another find/replace like so:

Find: . Object is of storage class GLACIER. Unable to perform download operations on GLACIER objects. You must restore the object to be able to perform the operation. See aws s3 download help for additional parameter options to ignore or force these transfers.
Replace (include a space at the front) --output text

Your final output will be a list of commands that look like this: aws s3api restore-object --bucket <bucketname> --restore-request Days=25,GlacierJobParameters={"Tier"="Bulk"} --key path/filename.extension --output text

They have been produced from your original input which looked like this: warning: Skipping file s3://mybucket/remotepath/subdir/filename.xml. Object is of storage class GLACIER. Unable to perform download operations on GLACIER objects. You must restore the object to be able to perform the operation. See aws s3 download help for additional parameter options to ignore or force these transfers.

Save the instructions in a batch file and run that. Wait many hours for AWS to retrieve the glacier files. Then call your sync function again and this time include --force-glacier-transfer in the parameter list and this will sync your glacier files.

A longer writeup of this process is here: https://schwarzsoftware.com.au/blogentry.php?id=45

Upvotes: 0

Maxim Masiutin

Reputation: 4812

The Perl script to restore the files

You can use the following Perl script to start the restore process of the files recursively and monitor the process. After the restore is completed, you can copy the files during the specified number of days.

#!/usr/bin/perl

use strict;
my $bucket = "yourbucket";
my $path = "yourdir/yoursubdir/";
my $days = 5; # the number of days you want the restored file to be accessible for
my $retrievaloption = "Bulk"; # retrieval option: Bulk, Standard, or Expedited
my $checkstatus = 0;
my $dryrun = 0;

my $cmd = "aws s3 ls s3://$bucket/$path --recursive";
print "$cmd\n";
my @lines = `$cmd`;
my @cmds;
foreach (@lines) {
  my $pos = index($_, $path);
  if ($pos > 0) {
    my $s = substr($_, $pos);
    chomp $s;
    if ($checkstatus)
    {
      $cmd = "aws s3api head-object --bucket $bucket --key \"$s\"";
    } else {
      $cmd = "aws s3api restore-object --bucket $bucket --key \"$s\" --restore-request Days=$days,GlacierJobParameters={\"Tier\"=\"$retrievaloption\"}";
    }
    push @cmds, $cmd;
  } else {
    die $_;
  }
} 
undef @lines;
foreach (@cmds)
{
  print "$_\n";
  unless ($dryrun) {print `$_`; print"\n";}
}

Before running the script, modify the $bucket and $path value. Run the script then and watch the output.

You can first run it in a "dry run" mode that will only print the AWS CLI commands to the screen without actually restoring the file. To do that, modify the $dryrun value to 1. You can redirect the output of the dry run to a batch file and execute it separately.

Monitor the restoration status

After you run the script and started the restore process, it will take from a few minutes to a few hours for the files to get available for copying.

You will be only able to copy the files after the restore process completes for each of the files.

To monitor the status, modify the $checkstatus value to 1 and run the script again. While the restoration is still in process, you will see the output, for each of the files, similar to the following:

{
    "AcceptRanges": "bytes",
    "Restore": "ongoing-request=\"true\"",
    "LastModified": "2022-03-07T11:13:53+00:00",
    "ContentLength": 1219493888,
    "ETag": "\"ad02c999d7fe6f1fb5ddb0734017d3b0-146\"",
    "ContentType": "binary/octet-stream",
    "Metadata": {},
    "StorageClass": "GLACIER"
}

When the files will finally became available for retrieval, the "Restore" line will look like the following:

"Restore": "ongoing-request=\"false\", expiry-date=\"Wed, 20 Apr 2022 00:00:00 GMT\"",

After that, you will be able to copy the files from AWS S3 to your local disk, e.g.

aws s3 cp "s3://yourbucket/yourdir/yoursubdir/" yourlocaldir --recursive --force-glacier-transfer

Restore options

Depending on the retrieval option you selected in the script for your files stored in the Amazon S3 Glacier Flexible Retrieval (formerly S3 Glacier) archive tier, "Expedited" retrievals complete recovery in 1-5 minutes, "Standard" — in 3-5 hours, and "Bulk" — in 5-12 hours. The "Bulk" retrieval option is the cheapest if not free (it depends on the Glacier tier which you chosen to keep your files at). "Expedited" is the most expensive retrival option and may not be available for retrivals from Amazon S3 Glacier Deep Archive storage tier, for which restoration may take up to 48 hours.

Improve the script to accept command-line parameters

By the way, you can modify the script to accept the bucket name and the directory name from the command line. In this case, replace the following two lines:

my $bucket = "yourbucket";
my $path = "yourdir/yoursubdir/";

to the following lines:

my $numargs = $#ARGV + 1;  
unless ($numargs == 2) {die "Usage: perl restore-aws.pl bucket path/\n";}
my $bucket=$ARGV[0];  
my $path=$ARGV[1];

Upvotes: 1