Reputation: 5660
Exactly like in this AWS forum question I was running 2 Jobs concurrently. The Job was configured with Max concurrency: 10
but when executing job.commit()
I receive this error message:
py4j.protocol.Py4JJavaError: An error occurred while calling z:com.amazonaws.services.glue.util.Job.commit.
: com.amazonaws.services.gluejobexecutor.model.VersionMismatchException:
Continuation update failed due to version mismatch. Expected version 6 but found version 7
(Service: AWSGlueJobExecutor; Status Code: 400; Error Code: VersionMismatchException; Request ID: 123)
The two Jobs read different portions of data.
But I can't understand what's the problem here and how to deal with it. Anyone can help?
Upvotes: 5
Views: 4174
Reputation: 2855
The default JobName for your bookmark is the glue JOB_NAME
, but it doesn't have to be.
Consider you have a glue job called JobA
which executes concurrently taking different input parameters. You have two concurrent executions with input parameter contextName
. Let's call the value passed into this parameter contextA
and contextB
.
The default initialisation in your pyspark script is:
Job.init(args['JOB_NAME'], args)
but you can change this to be unique for your execution context. Instead:
Job.init(args['JOB_NAME']+args['contextName'], args)
This is unique for each concurrent execution so would never clash. When you view the bookmark state from the cli for this job, you'd need to view it like this:
aws glue get-job-bookmark --job-name "jobAcontextA"
or
aws glue get-job-bookmark --job-name "jobAcontextB"
You wouldn't be able to use the UI to pause or reset the bookmark, you'd need to do it programatically.
Upvotes: 8
Reputation: 5660
Reporting @bgiannini's answer in this other AWS forum question, it looks like that the "version" was referring to job bookmarking.
If multiple instances of the same job are running simultaneously (i.e. max concurrency > 1) and using bookmarks, when job run 1 runs job.init() it gets a version and job.commit() seems to expect a certain value (+1 to version for every job.commit that is executed I guess?). If job run 2 started at the same time and got the same initial version from job.init(), then submits job.commit() before job 1 does, job 1 doesn't increment to the version it expected to.
Actually I was running the 2 Jobs with Job bookmark: Enable
. Indeed when disabling bookmarking, looks to be working for me.
I understand it might not be the best solution but it can be a good compromise.
Upvotes: 3