gamars
gamars

Reputation: 257

How to set diskSourceImage in google data flow pipeline

I've been trying to use custom made images to run my google data flow pipeline. Given the information from https://cloud.google.com/compute/docs/reference/latest/images I've tested the following code snippets:

DataflowPipelineOptions options = PipelineOptionsFactory.create().as(DataflowPipelineOptions.class);
...
options.setDiskSourceImage("ubuntu-1504-vivid-v20150911");
options.setDiskSourceImage("projects/ubuntu-os-cloud/global/images/ubuntu-1504-vivid-v20150911");
options.setDiskSourceImage("https://www.googleapis.com/compute/beta/projects/ubuntu-os-cloud/global/images/ubuntu-1504-vivid-v20150911");

all of the above tries led to the following error in my pipeline:

(b9c7b66a676906f4): Unable to create VMs. Causes: (b9c7b66a67690aef): Error: Message: Invalid value for field 'resource.disks[0].initializeParams.sourceImage': '[edited]'. Must be the URL to a Compute resource of the correct type HTTP Code: 400

Upvotes: 1

Views: 278

Answers (2)

Jeremy Lewi
Jeremy Lewi

Reputation: 6776

Using a custom disk image with Dataflow is not a viable option. The flag diskSourceImage is deprecated and will be removed in a future SDK release. The reason it is no longer supported is because the Dataflow service relies on versioned resources in the VM image. So Dataflow needs control of the VM image so that we can upgrade it as necessary. If users supply their own custom images we have no way of keeping them in sync with the requirements of the Dataflow service.

If your custom VM image is based off a Dataflow image then you would be able to execute jobs using that custom image until the next release of a Dataflow VM image. There is no reasonable way in which you would be able to keep your custom images in sync with Dataflow's VM images so that you would be able to keep this working.

If you would like to customize the VM image please let us know why (e.g. send us an email at [email protected]) so we can either suggest an alternative solution or else consider supporting your use case in the future.

Upvotes: 1

Sam McVeety
Sam McVeety

Reputation: 3214

There's a subtle issue with setDiskSourceImage -- it uses 'beta' instead of the current 'v1' version for Compute Engine. If you try the following, it should work:

options.setDiskSourceImage("https://www.googleapis.com/compute/v1/projects/ubuntu-os-cloud/global/images/ubuntu-1504-vivid-v20150911");

Upvotes: 0

Related Questions