Reputation: 4782
How can I make files stored at AWS S3 Glacier available for retrieval recursively from CLI?
I run the following command:
aws s3 cp "s3://mybucket/remotepath/" localpath --recursive
and got the following line for each of the files:
warning: Skipping file s3://mybucket/remotepath/subdir/filename.xml. Object is of storage class GLACIER. Unable to perform download operations on GLACIER objects. You must restore the object to be able to perform the operation. See aws s3 download help for additional parameter options to ignore or force these transfers.
However, the aws s3api restore-object
has a --key
parameter which specifies a single file without an ability to recursively traverse through directories.
How can I recursively restore files for retrieval from AWS CLI?
Upvotes: 0
Views: 2076
Reputation: 1552
A quick way to do this is to copy/paste your error messages into a text editor and use find/replace to convert the errors into restore-object calls. Use find/replace like so:
Find: warning: Skipping file s3://bucketname/
Replace: aws s3api restore-object --bucket <bucketname> --restore-request Days=25,GlacierJobParameters={"Tier"="Bulk"} --key
Then do another find/replace like so:
. Object is of storage class GLACIER. Unable to perform download operations on GLACIER objects. You must restore the object to be able to perform the operation. See aws s3 download help for additional parameter options to ignore or force these transfers.
--output text
Your final output will be a list of commands that look like this:
aws s3api restore-object --bucket <bucketname> --restore-request Days=25,GlacierJobParameters={"Tier"="Bulk"} --key path/filename.extension --output text
They have been produced from your original input which looked like this:
warning: Skipping file s3://mybucket/remotepath/subdir/filename.xml. Object is of storage class GLACIER. Unable to perform download operations on GLACIER objects. You must restore the object to be able to perform the operation. See aws s3 download help for additional parameter options to ignore or force these transfers.
Save the instructions in a batch file and run that. Wait many hours for AWS to retrieve the glacier files. Then call your sync function again and this time include --force-glacier-transfer
in the parameter list and this will sync your glacier files.
A longer writeup of this process is here: https://schwarzsoftware.com.au/blogentry.php?id=45
Upvotes: 0
Reputation: 4782
You can use the following Perl script to start the restore process of the files recursively and monitor the process. After the restore is completed, you can copy the files during the specified number of days.
#!/usr/bin/perl
use strict;
my $bucket = "yourbucket";
my $path = "yourdir/yoursubdir/";
my $days = 5; # the number of days you want the restored file to be accessible for
my $retrievaloption = "Bulk"; # retrieval option: Bulk, Standard, or Expedited
my $checkstatus = 0;
my $dryrun = 0;
my $cmd = "aws s3 ls s3://$bucket/$path --recursive";
print "$cmd\n";
my @lines = `$cmd`;
my @cmds;
foreach (@lines) {
my $pos = index($_, $path);
if ($pos > 0) {
my $s = substr($_, $pos);
chomp $s;
if ($checkstatus)
{
$cmd = "aws s3api head-object --bucket $bucket --key \"$s\"";
} else {
$cmd = "aws s3api restore-object --bucket $bucket --key \"$s\" --restore-request Days=$days,GlacierJobParameters={\"Tier\"=\"$retrievaloption\"}";
}
push @cmds, $cmd;
} else {
die $_;
}
}
undef @lines;
foreach (@cmds)
{
print "$_\n";
unless ($dryrun) {print `$_`; print"\n";}
}
Before running the script, modify the $bucket
and $path
value. Run the script then and watch the output.
You can first run it in a "dry run" mode that will only print the AWS CLI commands to the screen without actually restoring the file. To do that, modify the $dryrun
value to 1
. You can redirect the output of the dry run to a batch file and execute it separately.
After you run the script and started the restore process, it will take from a few minutes to a few hours for the files to get available for copying.
You will be only able to copy the files after the restore process completes for each of the files.
To monitor the status, modify the $checkstatus
value to 1
and run the script again. While the restoration is still in process, you will see the output, for each of the files, similar to the following:
{
"AcceptRanges": "bytes",
"Restore": "ongoing-request=\"true\"",
"LastModified": "2022-03-07T11:13:53+00:00",
"ContentLength": 1219493888,
"ETag": "\"ad02c999d7fe6f1fb5ddb0734017d3b0-146\"",
"ContentType": "binary/octet-stream",
"Metadata": {},
"StorageClass": "GLACIER"
}
When the files will finally became available for retrieval, the "Restore" line will look like the following:
"Restore": "ongoing-request=\"false\", expiry-date=\"Wed, 20 Apr 2022 00:00:00 GMT\"",
After that, you will be able to copy the files from AWS S3 to your local disk, e.g.
aws s3 cp "s3://yourbucket/yourdir/yoursubdir/" yourlocaldir --recursive --force-glacier-transfer
Depending on the retrieval option you selected in the script for your files stored in the Amazon S3 Glacier Flexible Retrieval (formerly S3 Glacier) archive tier, "Expedited" retrievals complete recovery in 1-5 minutes, "Standard" — in 3-5 hours, and "Bulk" — in 5-12 hours. The "Bulk" retrieval option is the cheapest if not free (it depends on the Glacier tier which you chosen to keep your files at). "Expedited" is the most expensive retrival option and may not be available for retrivals from Amazon S3 Glacier Deep Archive storage tier, for which restoration may take up to 48 hours.
By the way, you can modify the script to accept the bucket name and the directory name from the command line. In this case, replace the following two lines:
my $bucket = "yourbucket";
my $path = "yourdir/yoursubdir/";
to the following lines:
my $numargs = $#ARGV + 1;
unless ($numargs == 2) {die "Usage: perl restore-aws.pl bucket path/\n";}
my $bucket=$ARGV[0];
my $path=$ARGV[1];
Upvotes: 1