Reputation: 363
I have an Amazon S3 bucket with versioning enabled. Due to a misconfigured lifecycle policy, many of the objects in this bucket had Delete Markers added to them.
I can remove these markers from the S3 console to restore the previous versions of these objects, but there are enough objects to make doing this manually on the web console extremely time-inefficient.
Is there a way to find all Delete Markers in an S3 bucket and remove them, restoring all files in that bucket? Ideally I would like to do this from the console itself, although I will happily write a script or use the amazon CLI tools to do this if that's the only way.
Thanks!
Upvotes: 21
Views: 46827
Reputation: 1
I created this script to restore specific deleted folders from s3
import boto3
def restore_deleted_files(bucket_name: str, prefix: str) -> None:
s3_client = boto3.client('s3')
paginator = s3_client.get_paginator('list_object_versions')
page_iterator = paginator.paginate(Bucket=bucket_name, Prefix=prefix)
for page in page_iterator:
if 'DeleteMarkers' in page:
for delete_marker in page['DeleteMarkers']:
if delete_marker['IsLatest']:
key = delete_marker['Key']
version_id = delete_marker['VersionId']
print(f'Restoring {key} (version: {version_id})')
# Remove the delete marker to restore the object
s3_client.delete_object(
Bucket=bucket_name,
Key=key,
VersionId=version_id
)
if __name__ == "__main__":
bucket = "my-bucket"
folder_prefix = "deleted_folder/"
restore_deleted_files(bucket, folder_prefix)
Upvotes: 0
Reputation: 61
This code works with or without prefixes but if u use it without prefix I am not pretty sure it will deep dig enough for the 15 minutes timeout. I created a AWS lambda function using Node.js 14 (put the timeout up to 15minutes) and I ran it manually (it can be automated) but I had like 10 folders in one of the applications bucket that had been deleted and since I had versioning turned on there were versions with Delete Marker. So I had to loop through all the pages and find the ones with delete marker and restore them with their last version so they can be used again. The biggest issue was that while thinking through this was that I used recursion but I get the same 1000 objects since this is how AWS list objects function works and in order to get the next 1000 there is a property called NextKeyMarker that did the job for my case. And the flag that indicates if there are more pages is : IsTruncated - if it is true means that there are more pages - so invoke that function again with the nextKeyMarker to know where to start from (recursion).
const AWS = require('aws-sdk');
const s3 = new AWS.S3();
async function restoreDeletedObjects(bucketName, prefix, nextMarker) {
try {
const params = {
Bucket: bucketName,
Prefix: prefix,
KeyMarker: nextMarker,
};
const response = await s3.listObjectVersions(params).promise();
console.log(response, 'response');
if (response.DeleteMarkers.length > 0) {
console.log('in the first if', response.DeleteMarkers.length);
const objectsToDelete = response.DeleteMarkers.map(deleteMarker => ({
Key: deleteMarker.Key,
VersionId: deleteMarker.VersionId,
}));
console.log(objectsToDelete, 'objects to delete');
// Remove the delete markers and restore the objects
await s3.deleteObjects({
Bucket: bucketName,
Delete: { Objects: objectsToDelete, Quiet: false },
}).promise();
for (const deleteMarker of response.DeleteMarkers) {
console.log(deleteMarker, 'delete Marker');
if (deleteMarker.IsLatest) {
console.log(`Restored: s3://${bucketName}/${deleteMarker.Key}, VersionId: ${deleteMarker.VersionId}`);
}
}
}
if (response.IsTruncated) {
// If there are more delete markers, call the function recursively with the NextKeyMarker
await restoreDeletedObjects(bucketName, prefix, response.NextKeyMarker);
} else {
console.log('Everything with Delete Marker was restored sucessfully')
}
} catch (err) {
console.error("Error:", err);
}
}
exports.handler = async (event, context) => {
const bucketName = "bucket-name-here"; // Replace with your bucket name
const prefix = "put-your-prefix-here"; // Replace with your desired prefix
// we had structure like this - Buckets -> "data" -> and 10 folders here. so we used like it that for each folder that had more than 15000 objects ->
// const bucketName = "orders"
// const prefix = "data/folder1" then "data/folder2" and we ran that for each folder.
await restoreDeletedObjects(bucketName, prefix, undefined);
};```
Upvotes: 0
Reputation: 11
I checked the file size.
Marker size is 'None'
Remove all Marker.
import boto3
default_session=boto3.session.Session(profile_name="default")
s3_re=default_session.resource(service_name="s3", region_name="ap-northeast-2")
for each_bucket in s3_re.buckets.all():
bucket_name = each_bucket.name
s3 = boto3.resource('s3')
bucket = s3.Bucket(bucket_name)
version = bucket.object_versions
for ver in version.all():
if str(ver.size) in 'None':
delete_file = ver.delete()
print(delete_file)
else:
pass
Upvotes: 1
Reputation: 1578
Most of the above versions are very slow on large buckets as they use delete-object
rather than delete-objects
. Here a variant on the bash version which uses awk to issue 100 requests at a time:
Edit: just saw @Viacheslav's version which also uses delete-objects
and is nice and clean, but will fail with large numbers of markers due to line length issues.
#!/bin/bash
bucket=$1
prefix=$2
aws s3api list-object-versions \
--bucket "$bucket" \
--prefix "$prefix" \
--query 'DeleteMarkers[][Key,VersionId]' \
--output text |
awk '{ acc = acc "{Key=" $1 ",VersionId=" $2 "}," }
NR % 100 == 0 {print "Objects=[" acc "],Quiet=False"; acc="" }
END { print "Objects=[" acc "],Quiet=False" }' |
while read batch; do
aws s3api delete-objects --bucket "$bucket" --delete "$batch" --output text
done
Upvotes: 4
Reputation: 5506
Set up a life cycle rule to remove them after a certain days. Otherwise it will cost you 0.005$ per 1000 Object Listing.
So most efficient way is setting up a lifecycle rule.
Here is the step by step method. https://docs.aws.amazon.com/AmazonS3/latest/user-guide/create-lifecycle.html
Upvotes: 1
Reputation: 157
I have been dealing with this problem a few weeks ago.
Finally I managed to generate a function in PHP that deletes the 'deleted markers' of the latest version of the files within a prefix. Personally, it worked perfectly and, in a pass of this script, iterating through all the prefixes, I managed to mend my own error by deleting many s3 objects unintentionally.
I leave my implementation in PHP below :
private function restore_files($file)
{
$storage = get_storage()->getDriver()->getAdapter()->getClient();
$bucket_name = 'my_bucket_name';
$s3_path=$file->s3_path;
$restore_folder_path = pathinfo($s3_path, PATHINFO_DIRNAME);
$data = $storage->listObjectVersions([
'Bucket' => $bucket_name,
'Prefix' => $restore_folder_path,
]);
$data_array = $data->toArray();
$deleteMarkers = $data_array['DeleteMarkers'];
foreach ($deleteMarkers as $key => $delete_marker) {
if ($delete_marker["IsLatest"]) {
$objkey = $delete_marker["Key"];
$objVersionId = $delete_marker["VersionId"];
$delete_response = $storage-> deleteObjectAsync([
'Bucket' => $bucket_name,
'Key' => $objkey,
'VersionId' => $objVersionId
]);
}
}
}
Some considerations about the script:
Upvotes: 3
Reputation: 5509
Here's a sample Python implementation:
import boto3
import botocore
BUCKET_NAME = 'BUCKET_NAME'
s3 = boto3.resource('s3')
def main():
bucket = s3.Bucket(BUCKET_NAME)
versions = bucket.object_versions
for version in versions.all():
if is_delete_marker(version):
version.delete()
def is_delete_marker(version):
try:
# note head() is faster than get()
version.head()
return False
except botocore.exceptions.ClientError as e:
if 'x-amz-delete-marker' in e.response['ResponseMetadata']['HTTPHeaders']:
return True
# an older version of the key but not a DeleteMarker
elif '404' == e.response['Error']['Code']:
return False
if __name__ == '__main__':
main()
For some context for this answer see: https://docs.aws.amazon.com/AmazonS3/latest/dev/DeleteMarker.html
If you try to get an object and its current version is a delete marker, Amazon S3 responds with:
- A 404 (Object not found) error
- A response header, x-amz-delete-marker: true
The response header tells you that the object accessed was a delete marker. This response header never returns false; if the value is false, Amazon S3 does not include this response header in the response.
The only way to list delete markers (and other versions of an object) is by using the versions subresource in a GET Bucket versions request. A simple GET does not retrieve delete marker objects.
Unfortunately, despite what is written in https://github.com/boto/botocore/issues/674, checking if ObjectVersion.size is None
is not a reliable way to determine if a version is a delete marker as it will also be true for previously deleted versions of folder keys.
Currently, boto3 is missing a straightforward way to determine if an ObjectVersion
is a DeleteMarker. See https://github.com/boto/boto3/issues/1769
However, ObjectVersion.head()
and .Get()
operations will throw an exception on an ObjectVersion
that is a DeleteMarker. Catching this exception is likely the only reliable way of determining if an ObjectVersion
is a DeleteMarker.
Upvotes: 24
Reputation: 1286
Use this to restore the files inside the specific folder. I've used aws cli commands in my script. Provide input as: sh scriptname.sh bucketname path/to/a/folder
**Script:**
#!/bin/bash
#please provide the bucketname and path to destination folder to restore
# Remove all versions and delete markers for each object
aws s3api list-object-versions --bucket $1 --prefix $2 --output text |
grep "DELETEMARKERS" | while read obj
do
KEY=$( echo $obj| awk '{print $3}')
VERSION_ID=$( echo $obj | awk '{print $5}')
echo $KEY
echo $VERSION_ID
aws s3api delete-object --bucket $1 --key $KEY --version-id $VERSION_ID
done
Edit: put $VERSION_ID
in correct position in the script
Upvotes: 27
Reputation: 1465
Define variables
PROFILE="personal"
REGION="eu-west-1"
BUCKET="mysql-backend-backups-prod"
Delete DeleteMarkers at once
aws --profile $PROFILE s3api delete-objects \
--region $REGION \
--bucket $BUCKET \
--delete "$(aws --profile $PROFILE s3api list-object-versions \
--region $REGION \
--bucket $BUCKET \
--output=json \
--query='{Objects: DeleteMarkers[].{Key:Key,VersionId:VersionId}}')"
Delete versions at once
aws --profile $PROFILE s3api delete-objects \
--region $REGION \
--bucket $BUCKET \
--delete "$(aws --profile $PROFILE s3api list-object-versions \
--region $REGION \
--bucket $BUCKET \
--output=json \
--query='{Objects: Versions[].{Key:Key,VersionId:VersionId}}')"
And delete S3 bucket afterward
aws --profile $PROFILE s3api delete-bucket \
--region $REGION \
--bucket $BUCKET
Upvotes: 11
Reputation: 91
I just wrote a program (using boto) to solve the same problem:
from boto.s3 import deletemarker
from boto.s3.connection import S3Connection
from boto.s3.key import Key
def restore_bucket(bucket_name):
bucket = conn.get_bucket(bucket_name)
for version in bucket.list_versions():
if isinstance(version, deletemarker.DeleteMarker) and version.is_latest:
bucket.delete_key(version.name, version_id=version.version_id)
If you need to restore folders within the versioned buckets, the rest of the program I wrote can be found here.
Upvotes: 9
Reputation: 269320
You would need to write a program to:
This could be done fairly easily using the SDK, such as boto
.
The AWS Command-Line Interface (CLI) can also be used, but you would have to build a script around it to capture the IDs and then delete the markers.
Upvotes: 5