Reputation: 416
I have implemented S3DeleteObjectsOperator, but even though task shows key deleted successfully, in reality the object does not get deleted from the S3 bucket.
delete_s3bucket_files = S3DeleteObjectsOperator(
task_id='delete_s3bucket_files',
start_date=start_date,
bucket='****',
keys='******************',
aws_conn_id='aws_default',
)
Even though the task is getting completed as passed, it does not delete objects inside the specified key in the bucket. I can see the logs below:
[2019-09-26 14:19:15,554] {base_task_runner.py:101}INFO - Job 1435: Subtask delete_s3bucket_files [2019-09-26 14:19:15,553] {cli.py:517} INFO - Running <TaskInstance: test_s3_delete.delete_s3bucket_files 2019-09-26T12:18:59.362470+00:00 [running]> on host Saurav-macbook.local
[2019-09-26 14:19:15,883] {s3_delete_objects_operator.py:83} INFO - Deleted: ['******************']
How can I understand what the task is performing and why does the object does not get deleted?
Upvotes: 2
Views: 4361
Reputation: 113
you could try to use it this way using prefix
:
from airflow.providers.amazon.aws.operators.s3 import S3DeleteObjectsOperator
clear_s3_with_prefix_task = S3DeleteObjectsOperator(
task_id="clear_s3_with_prefix_task",
bucket=<s3_bucket_name>,
# All objects matching this prefix in the bucket will be deleted:
prefix=<your_s3_path>,
aws_conn_id=<s3_conn_id>',
dag=dag
)
Upvotes: 1
Reputation: 11
I'm also facing the same issue, like the dag is getting success but that partition is still there. At last, i found out that I'm not passing the keys properly Like x is your bucket name with object as y/year/2021/07 So, bucket='x' keys='y/year/2021/07' You have to pass the complete path of the object in the keys without s3://x, It'll work fine after this
delete_s3bucket_files = S3DeleteObjectsOperator( task_id='delete_s3bucket_files',bucket='x',keys='y/year/2021/07',aws_conn_id='aws_default',dag=dag)
Upvotes: 1
Reputation: 416
I couldn't get the object deletion working with S3DeleteObjectOperator
as well as the delete_objects()
method in S3Hook
. So instead I used the boto3
delete()
method to delete the object.
def delete_files():
s3 = boto3.resource('s3', aws_access_key_id='****', aws_secret_access_key='******************')
s3_bucket = s3.Bucket('****')
s3_bucket.objects.all().delete()
delete_s3bucket_files = PythonOperator(
task_id='delete_s3bucket_files',
start_date=start_date,
python_callable=delete_files,
dag=dag
)
Not sure if this is the right way to do, but works for me as of now.
Upvotes: 2