calycolor
calycolor

Reputation: 736

How to stop / exit a AWS Glue Job (PySpark)?

I have a successfully running AWS Glue Job that transform data for predictions. I would like to stop processing and output status message (which is working) if I reach a specific condition:

if specific_condition is None:
    s3.put_object(Body=json_str, Bucket=output_bucket, Key=json_path )
    return None

This produces "SyntaxError: 'return' outside function", I tried:

if specific_condition is None:
    s3.put_object(Body=json_str, Bucket=output_bucket, Key=json_path )
    job.commit()

This is not running in AWS Lambda, it is Glue Job that gets started using Lambda (e.g., start_job_run()).

Upvotes: 7

Views: 8295

Answers (3)

amsh
amsh

Reputation: 3387

[This answer may not be applicable to latest glue job versions, please refer to Jeremy's answer.]

There's no return in Glue Spark jobs, and job.commit() just signals Glue that the job's task was completed and that's all, script continues its run after that. To end your job after your process is complete, you'll have to:

  1. Call sys.exit(STATUS_CODE) #Status code can be any
  2. Code strategically in conditions, such that job doesn't have any lines of code after job.commit.

Please note that, if sys.exit is called before job.commit(), glue job will be failed.

Upvotes: 1

ALTAF HUSSAIN
ALTAF HUSSAIN

Reputation: 375

If you click on jobs and the click your relevant job you will see a x mark with running in job status.

Image of the X in the running job status

For reference please check https://forums.aws.amazon.com/thread.jspa?threadID=262217

Upvotes: -2

Jérémy
Jérémy

Reputation: 1900

Since @amsh's solution did not worked for me, I continued to look for a solution and discovered that:

os._exit() terminates immediately at the C level and does not perform any of the normal tear-downs of the interpreter.

Thanks to @Glyph's answer! You can then proceed this way:

if specific_condition is None:
    s3.put_object(Body=json_str, Bucket=output_bucket, Key=json_path )
    job.commit()
    os._exit()

Your job will succeed and not terminates with a "SystemExit: 0" error.

Upvotes: 6

Related Questions