Saurav Ganguli
Saurav Ganguli

Reputation: 416

S3_delete_objects_operator does not delete files in the bucket

I have implemented S3DeleteObjectsOperator, but even though task shows key deleted successfully, in reality the object does not get deleted from the S3 bucket.

delete_s3bucket_files = S3DeleteObjectsOperator(
  task_id='delete_s3bucket_files',
  start_date=start_date,
  bucket='****',
  keys='******************',
  aws_conn_id='aws_default',
)

Even though the task is getting completed as passed, it does not delete objects inside the specified key in the bucket. I can see the logs below:

[2019-09-26 14:19:15,554] {base_task_runner.py:101}INFO - Job 1435: Subtask delete_s3bucket_files [2019-09-26 14:19:15,553] {cli.py:517} INFO - Running <TaskInstance: test_s3_delete.delete_s3bucket_files 2019-09-26T12:18:59.362470+00:00 [running]> on host Saurav-macbook.local
[2019-09-26 14:19:15,883] {s3_delete_objects_operator.py:83} INFO - Deleted: ['******************']

How can I understand what the task is performing and why does the object does not get deleted?

Upvotes: 2

Views: 4361

Answers (3)

Bosco Han
Bosco Han

Reputation: 113

you could try to use it this way using prefix:

from airflow.providers.amazon.aws.operators.s3 import S3DeleteObjectsOperator

clear_s3_with_prefix_task = S3DeleteObjectsOperator(
    task_id="clear_s3_with_prefix_task",
    bucket=<s3_bucket_name>,
    # All objects matching this prefix in the bucket will be deleted:
    prefix=<your_s3_path>,
    aws_conn_id=<s3_conn_id>',
    dag=dag
)

Upvotes: 1

Abhay Partap Singh
Abhay Partap Singh

Reputation: 11

I'm also facing the same issue, like the dag is getting success but that partition is still there. At last, i found out that I'm not passing the keys properly Like x is your bucket name with object as y/year/2021/07 So, bucket='x' keys='y/year/2021/07' You have to pass the complete path of the object in the keys without s3://x, It'll work fine after this

delete_s3bucket_files = S3DeleteObjectsOperator(  task_id='delete_s3bucket_files',bucket='x',keys='y/year/2021/07',aws_conn_id='aws_default',dag=dag)

Upvotes: 1

Saurav Ganguli
Saurav Ganguli

Reputation: 416

I couldn't get the object deletion working with S3DeleteObjectOperator as well as the delete_objects() method in S3Hook. So instead I used the boto3 delete() method to delete the object.

def delete_files():
  s3 = boto3.resource('s3', aws_access_key_id='****', aws_secret_access_key='******************')
  s3_bucket = s3.Bucket('****')
  s3_bucket.objects.all().delete()


delete_s3bucket_files = PythonOperator(
  task_id='delete_s3bucket_files',
  start_date=start_date,
  python_callable=delete_files,
  dag=dag
)

Not sure if this is the right way to do, but works for me as of now.

Upvotes: 2

Related Questions