Reputation: 481
I have a Flink Job Manager running remotely, and am using the REST Monitoring API to schedule jobs.
Flink enables savepointing so that I can interrupt a running job and resume it from saved state. The persistence location for savepoint data should be external to the job manager (so that, for example, I can reconfigure and restart the job manager and resume the job), such as an HDFS volume or S3 bucket. This persistence location can be globally configured on the job manager with the state.savepoints.dir
config key, but it is also supposed to be able to be passed in with the interrupt command:
# When job manager is running locally
$ bin/flink cancel -s s3:///savepoint-bucket/my-awesome-job <jobID>
When running remotely, the corresponding REST request would be
GET https://jm-host:8081/jobs/:jobid/cancel-with-savepoint/target-directory/s3:///savepoint-bucket/my-awesome-job-savepoints
However, the REST handler can't handle this request -- it returns a 502 BAD GATEWAY because it reads the slashes as path component separators. And passing in an unqualified directory name, as
GET https://jm-host:8081/jobs/:jobid/cancel-with-savepoint/target-directory/my-awesome-job-savepoints
does not resolve my-awesome-job-savepoints
as a subdirectory of the remote persistence location (even if the state.savepoints.dir
config key is set), but instead attempts to create the my-awesome-job-savepoints
subdirectory of the current working directory of the REST service Java application.
I have tried URL-encoding the fully qualified persistence path, but this does not help. Is there any way to pass this fully qualified path to the Job Manager through the REST API? (Assume for the purposes of this question that it's impossible to use bin/flink -m jm-host:8081
.)
Upvotes: 1
Views: 971
Reputation: 1060
Be sure to carefully encode the target directory with percent-encoding, as per RFC 3986.
For example, given the directory s3:///savepoint-bucket/my-awesome-job
, which encodes to s3%3A%2F%2F%2Fsavepoint-bucket%2Fmy-awesome-job
, I was able to submit the following URL:
http://localhost:8081/jobs/5c360ded6e4b7d8db103e71d68b7c83d/cancel-with-savepoint/target-directory/s3%3A%2F%2F%2Fsavepoint-bucket%2Fmy-awesome-job
And see the following in the log:
2017-09-19 14:27:45,939 INFO org.apache.flink.runtime.jobmanager.JobManager - Trying to cancel job 5c360ded6e4b7d8db103e71d68b7c83d with savepoint to s3:///savepoint-bucket/my-awesome-job
Upvotes: 2