Reputation: 1890
I am using Glue bookmarking to process data. My job is scheduled every day, but can also be launch "manually". Since I use bookmarks, sometimes the Glue job can start without having new data to process, the read dataframe is then empty. In this case, I want to end my job properly because it has nothing to do. I tried:
if df.rdd.isEmpty():
job.commit()
sys.exit(0)
However, my job terminate in error with SystemExit: 0
.
How to end the job with success?
Upvotes: 3
Views: 7680
Reputation: 181
if df.rdd.isEmpty():
raise Exception(f"Procedure failed, stopping Glue job.")
The raise error worked for me and exited, but will result in a "Failed" glue job status. In my case, I wanted it to fail.
Upvotes: 0
Reputation: 81
Just using os._exit()
doesn't work in Glue Version 3.0
To exit a job gracefully after some conditions have been met, use:
import os
import sys
.
. # Your Glue Job Code
.
logger.info("Existing job gracefully...") # Or a simple print; print("...")
job.commit() # Only necessary if you are loading data from s3 and you have job bookmarks enabled.
os._exit(0) # Using a 0 status code throws no exception so your job completes with a succeeded status.
But, if you want to exit with an error use:
sys.exit("Error Message...") # this will exit with an error message that will be displayed on the Glue UI -- Run Details --, and the job would have a status of failed.
Upvotes: 8
Reputation: 1890
After some test, I discovered from @Glyph's answer that :
os._exit()
terminates immediately at the C level and does not perform any of the normal tear-downs of the interpreter.
Which is exactly what I was looking for. The final solution is:
import os
if df.rdd.isEmpty():
job.commit()
os._exit()
Upvotes: 3