Ricky Lui
Ricky Lui

Reputation: 155

How to create a custom dataproc image based on a preview release

I was trying to create a custom Dataproc image in GCP. It works fine with a base image that is in stable release (for example 1.3.24). However, if I specify a base image that is in preview (for example 1.4.0), I receive the following error messages:

if I specify one of the following as --dataproc-version,

I get RuntimeError: ('Cannot find dataproc base image with dataproc-version=%s.', '<the specified version>')

if I specify one of the following as --dataproc-version

I get generate_custom_image.py: error: argument --dataproc-version: Invalid version: <the specified version>.

Therefore the question is, can we build a custom Dataproc image based on a preview release? If so how should I specify the --dataproc-version?

Thank you so much

Upvotes: 0

Views: 1723

Answers (3)

Aniket Mokashi
Aniket Mokashi

Reputation: 177

Thanks for reporting and fixing this! Note, that python version was changed from 3.7 to 3.6 in the latest image release according to this.

Upvotes: 0

howie
howie

Reputation: 2695

According to the source code of generate_custom_image.py

47 # Old style images: 1.2.3
48 # New style images: 1.2.3-deb8
49 _VERSION_REGEX = re.compile(r"^\d+\.\d+\.\d+(-.{4})?$")

Only 1.4.0-deb9 can match the regex but 1.4.0-RC10-deb9 will not match.

If you want to use preview release, you need to change the regex in generate_custom_image.py

Update: I have sent pull request to dataproc cumstom image

Upvotes: 3

Guillem Xercavins
Guillem Xercavins

Reputation: 7058

When using the CLI I get the following error:

ERROR: (gcloud.dataproc.clusters.create) INVALID_ARGUMENT: Failed to resolve image version '1.4'. Accepted image versions: [0.1, 0.2, 1.0, 1.0-deb9, 1.1, 1.1-deb9, 1.2, 1.2-deb9, 1.3, 1.3-deb9, preview]. See https://cloud.google.com/dataproc/docs/concepts/versioning/dataproc-versions for additional information on image versioning.

So, if I use preview it works and creates one with 1.4.0-RC10-deb9:

gcloud dataproc clusters create cluster-name --image-version preview

EDIT: regarding custom images

After inspecting what the script does it retrieves theimages using this filter. If we just list all of them we can see the available ones such as:

$ gcloud compute images list --project cloud-dataproc
...
dataproc-1-4-deb9-20190213-000000-rc01                cloud-dataproc                                                   READY

One possible way to select that one would be to replace lines 122-123 of generate_custom_image.py with:

filter_arg = "--filter=name:dataproc-1-4-deb9-20190213-000000-rc01"

and call the script with a dummy version for the regex:

python generate_custom_image.py --dataproc-version 1.2.0 ...

Upvotes: 2

Related Questions