Dave Stern
Dave Stern

Reputation: 363

How to Remove Delete Markers from Multiple Objects on Amazon S3 at once

I have an Amazon S3 bucket with versioning enabled. Due to a misconfigured lifecycle policy, many of the objects in this bucket had Delete Markers added to them.

I can remove these markers from the S3 console to restore the previous versions of these objects, but there are enough objects to make doing this manually on the web console extremely time-inefficient.

Is there a way to find all Delete Markers in an S3 bucket and remove them, restoring all files in that bucket? Ideally I would like to do this from the console itself, although I will happily write a script or use the amazon CLI tools to do this if that's the only way.

Thanks!

Upvotes: 21

Views: 46827

Answers (11)

Jay Saini
Jay Saini

Reputation: 1

I created this script to restore specific deleted folders from s3

import boto3
def restore_deleted_files(bucket_name: str, prefix: str) -> None:
    s3_client = boto3.client('s3')

paginator = s3_client.get_paginator('list_object_versions')
page_iterator = paginator.paginate(Bucket=bucket_name, Prefix=prefix)

for page in page_iterator:
    if 'DeleteMarkers' in page:
        for delete_marker in page['DeleteMarkers']:
            if delete_marker['IsLatest']:
                key = delete_marker['Key']
                version_id = delete_marker['VersionId']

                print(f'Restoring {key} (version: {version_id})')
                # Remove the delete marker to restore the object
                s3_client.delete_object(
                    Bucket=bucket_name,
                    Key=key,
                    VersionId=version_id
                )

if __name__ == "__main__":
    bucket = "my-bucket"
    folder_prefix = "deleted_folder/"
    restore_deleted_files(bucket, folder_prefix)

Upvotes: 0

Petar Dzhunov
Petar Dzhunov

Reputation: 61

This code works with or without prefixes but if u use it without prefix I am not pretty sure it will deep dig enough for the 15 minutes timeout. I created a AWS lambda function using Node.js 14 (put the timeout up to 15minutes) and I ran it manually (it can be automated) but I had like 10 folders in one of the applications bucket that had been deleted and since I had versioning turned on there were versions with Delete Marker. So I had to loop through all the pages and find the ones with delete marker and restore them with their last version so they can be used again. The biggest issue was that while thinking through this was that I used recursion but I get the same 1000 objects since this is how AWS list objects function works and in order to get the next 1000 there is a property called NextKeyMarker that did the job for my case. And the flag that indicates if there are more pages is : IsTruncated - if it is true means that there are more pages - so invoke that function again with the nextKeyMarker to know where to start from (recursion).

const AWS = require('aws-sdk');
const s3 = new AWS.S3();

async function restoreDeletedObjects(bucketName, prefix, nextMarker) {
  try {
    const params = {
      Bucket: bucketName,
      Prefix: prefix,
      KeyMarker: nextMarker,
    };

    const response = await s3.listObjectVersions(params).promise();
    console.log(response, 'response');

    if (response.DeleteMarkers.length > 0) {
      console.log('in the first if', response.DeleteMarkers.length);
      const objectsToDelete = response.DeleteMarkers.map(deleteMarker => ({
        Key: deleteMarker.Key,
        VersionId: deleteMarker.VersionId,
      }));

      console.log(objectsToDelete, 'objects to delete');
      // Remove the delete markers and restore the objects
      await s3.deleteObjects({
        Bucket: bucketName,
        Delete: { Objects: objectsToDelete, Quiet: false },
      }).promise();

      for (const deleteMarker of response.DeleteMarkers) {
        console.log(deleteMarker, 'delete Marker');
        if (deleteMarker.IsLatest) {
          console.log(`Restored: s3://${bucketName}/${deleteMarker.Key}, VersionId: ${deleteMarker.VersionId}`);
        }
      }
    }

    if (response.IsTruncated) {
      // If there are more delete markers, call the function recursively with the NextKeyMarker
      await restoreDeletedObjects(bucketName, prefix, response.NextKeyMarker);
    } else {
      console.log('Everything with Delete Marker was restored sucessfully')
    }
  } catch (err) {
    console.error("Error:", err);
  }
}

exports.handler = async (event, context) => {
  const bucketName = "bucket-name-here"; // Replace with your bucket name
  const prefix = "put-your-prefix-here"; // Replace with your desired prefix
  // we had structure like this - Buckets -> "data" -> and 10 folders here. so we used like it that for each folder that had more than 15000 objects  -> 
  // const bucketName = "orders"
  // const prefix = "data/folder1" then "data/folder2" and we ran that for each folder.

  await restoreDeletedObjects(bucketName, prefix, undefined);
};```

Upvotes: 0

Sungsoo. Kim
Sungsoo. Kim

Reputation: 11

I checked the file size. 
Marker size is 'None'
Remove all Marker.
import boto3

default_session=boto3.session.Session(profile_name="default")
s3_re=default_session.resource(service_name="s3", region_name="ap-northeast-2")
for each_bucket in s3_re.buckets.all():
    bucket_name = each_bucket.name
    s3 = boto3.resource('s3')
    bucket = s3.Bucket(bucket_name)
    version = bucket.object_versions
    for ver in version.all():
        if str(ver.size) in 'None':
            delete_file = ver.delete()
            print(delete_file)
        else:
            pass

Upvotes: 1

Typhlosaurus
Typhlosaurus

Reputation: 1578

Most of the above versions are very slow on large buckets as they use delete-object rather than delete-objects. Here a variant on the bash version which uses awk to issue 100 requests at a time:

Edit: just saw @Viacheslav's version which also uses delete-objects and is nice and clean, but will fail with large numbers of markers due to line length issues.

#!/bin/bash

bucket=$1
prefix=$2

aws s3api list-object-versions \
    --bucket "$bucket" \
    --prefix "$prefix" \
    --query 'DeleteMarkers[][Key,VersionId]' \
    --output text |
awk '{ acc = acc "{Key=" $1 ",VersionId=" $2 "}," }
     NR % 100 == 0 {print "Objects=[" acc "],Quiet=False"; acc="" }
     END { print "Objects=[" acc "],Quiet=False" }' |
while read batch; do
    aws s3api delete-objects --bucket "$bucket" --delete "$batch" --output text
done

Upvotes: 4

Bira
Bira

Reputation: 5506

Set up a life cycle rule to remove them after a certain days. Otherwise it will cost you 0.005$ per 1000 Object Listing.

So most efficient way is setting up a lifecycle rule.

Here is the step by step method. https://docs.aws.amazon.com/AmazonS3/latest/user-guide/create-lifecycle.html

Upvotes: 1

Agus Trombotto
Agus Trombotto

Reputation: 157

I have been dealing with this problem a few weeks ago.

Finally I managed to generate a function in PHP that deletes the 'deleted markers' of the latest version of the files within a prefix. Personally, it worked perfectly and, in a pass of this script, iterating through all the prefixes, I managed to mend my own error by deleting many s3 objects unintentionally.

I leave my implementation in PHP below :

private function restore_files($file)
{
    $storage = get_storage()->getDriver()->getAdapter()->getClient();
    $bucket_name = 'my_bucket_name';
    $s3_path=$file->s3_path;

    $restore_folder_path = pathinfo($s3_path, PATHINFO_DIRNAME);

    $data = $storage->listObjectVersions([
        'Bucket' => $bucket_name,
        'Prefix' => $restore_folder_path,
    ]);

    $data_array = $data->toArray();
    $deleteMarkers = $data_array['DeleteMarkers'];

    foreach ($deleteMarkers as $key => $delete_marker) {
        if ($delete_marker["IsLatest"]) {
            $objkey = $delete_marker["Key"];
            $objVersionId = $delete_marker["VersionId"];

            $delete_response = $storage-> deleteObjectAsync([
                'Bucket' => $bucket_name,
                'Key' => $objkey,
                'VersionId' => $objVersionId
            ]);
        }
    }
}

Some considerations about the script:

  1. The code was implemented using Laravel Framework so, in the variable $storage, i get the PHP SDK alone, without using all the laravel's wrapper. So, $storage varible is the Client Object of the S3 SDK. Here is the documentation that I have used.
  2. The $file parameter that the function receives, is an objected that has the s3_path in their's properties. So, in the $restore_folder_path varible, I get the prefix of the object s3 path.
  3. Finally, i get all the objects inside the prefix in s3. I iterate over the DeleteMarkers list, and ask if the current object is the lasted deleted marker. If it is, i make a post to deleteObject function with the specif id of the object that i want to delete it's deleted marker. This is the way s3 documentation specify to remove the deleted marker

Upvotes: 3

Tomasz
Tomasz

Reputation: 5509

Here's a sample Python implementation:

import boto3
import botocore

BUCKET_NAME = 'BUCKET_NAME'
s3 = boto3.resource('s3')


def main():
    bucket = s3.Bucket(BUCKET_NAME)
    versions = bucket.object_versions

    for version in versions.all():
        if is_delete_marker(version):
             version.delete()


def is_delete_marker(version):
    try:
        # note head() is faster than get()
        version.head()
        return False
    except botocore.exceptions.ClientError as e:
        if 'x-amz-delete-marker' in e.response['ResponseMetadata']['HTTPHeaders']:
            return True
        # an older version of the key but not a DeleteMarker
        elif '404' == e.response['Error']['Code']:
            return False


if __name__ == '__main__':
    main()

For some context for this answer see: https://docs.aws.amazon.com/AmazonS3/latest/dev/DeleteMarker.html

If you try to get an object and its current version is a delete marker, Amazon S3 responds with:

  • A 404 (Object not found) error
  • A response header, x-amz-delete-marker: true

The response header tells you that the object accessed was a delete marker. This response header never returns false; if the value is false, Amazon S3 does not include this response header in the response.

The only way to list delete markers (and other versions of an object) is by using the versions subresource in a GET Bucket versions request. A simple GET does not retrieve delete marker objects.

Unfortunately, despite what is written in https://github.com/boto/botocore/issues/674, checking if ObjectVersion.size is None is not a reliable way to determine if a version is a delete marker as it will also be true for previously deleted versions of folder keys.

Currently, boto3 is missing a straightforward way to determine if an ObjectVersion is a DeleteMarker. See https://github.com/boto/boto3/issues/1769

However, ObjectVersion.head() and .Get() operations will throw an exception on an ObjectVersion that is a DeleteMarker. Catching this exception is likely the only reliable way of determining if an ObjectVersion is a DeleteMarker.

Upvotes: 24

Kc Bickey
Kc Bickey

Reputation: 1286

Use this to restore the files inside the specific folder. I've used aws cli commands in my script. Provide input as: sh scriptname.sh bucketname path/to/a/folder

**Script:**
#!/bin/bash
#please provide the bucketname and path to destination folder to restore
# Remove all versions and delete markers for each object
 aws s3api list-object-versions --bucket $1 --prefix $2 --output text | 
 grep "DELETEMARKERS" | while read obj
   do
        KEY=$( echo $obj| awk '{print $3}')
        VERSION_ID=$( echo $obj | awk '{print $5}')
        echo $KEY
        echo $VERSION_ID
        aws s3api delete-object --bucket $1 --key $KEY --version-id $VERSION_ID

   done

Edit: put $VERSION_ID in correct position in the script

Upvotes: 27

Viacheslav
Viacheslav

Reputation: 1465

Define variables

PROFILE="personal"
REGION="eu-west-1"
BUCKET="mysql-backend-backups-prod"

Delete DeleteMarkers at once

aws --profile $PROFILE s3api delete-objects \
    --region $REGION \
    --bucket $BUCKET \
    --delete "$(aws --profile $PROFILE s3api list-object-versions \
                    --region $REGION \
                    --bucket $BUCKET \
                    --output=json \
                    --query='{Objects: DeleteMarkers[].{Key:Key,VersionId:VersionId}}')"

Delete versions at once

aws --profile $PROFILE s3api delete-objects \
    --region $REGION \
    --bucket $BUCKET \
    --delete "$(aws --profile $PROFILE s3api list-object-versions \
                    --region $REGION \
                    --bucket $BUCKET \
                    --output=json \
                    --query='{Objects: Versions[].{Key:Key,VersionId:VersionId}}')"

And delete S3 bucket afterward

aws --profile $PROFILE s3api delete-bucket \
    --region $REGION \
    --bucket $BUCKET

Upvotes: 11

H Lemos
H Lemos

Reputation: 91

I just wrote a program (using boto) to solve the same problem:

from boto.s3 import deletemarker
from boto.s3.connection import S3Connection
from boto.s3.key import Key

def restore_bucket(bucket_name): 
    bucket = conn.get_bucket(bucket_name)
    for version in bucket.list_versions():
        if isinstance(version, deletemarker.DeleteMarker) and version.is_latest:
            bucket.delete_key(version.name, version_id=version.version_id)

If you need to restore folders within the versioned buckets, the rest of the program I wrote can be found here.

Upvotes: 9

John Rotenstein
John Rotenstein

Reputation: 269320

You would need to write a program to:

  • Loop through all objects in the Amazon S3 bucket
  • Retrieve the version IDs for each version of each object
  • Delete the delete markers

This could be done fairly easily using the SDK, such as boto.

The AWS Command-Line Interface (CLI) can also be used, but you would have to build a script around it to capture the IDs and then delete the markers.

Upvotes: 5

Related Questions