Reputation: 155
I was trying to create a custom Dataproc image in GCP. It works fine with a base image that is in stable release (for example 1.3.24). However, if I specify a base image that is in preview (for example 1.4.0), I receive the following error messages:
if I specify one of the following as --dataproc-version
,
1.4.0-deb9
1.4.0
1.4
I get RuntimeError: ('Cannot find dataproc base image with dataproc-version=%s.', '<the specified version>')
if I specify one of the following as --dataproc-version
1.4.0-RC8
1.4.0-RC8-deb9
I get generate_custom_image.py: error: argument --dataproc-version: Invalid version: <the specified version>.
Therefore the question is, can we build a custom Dataproc image based on a preview release? If so how should I specify the --dataproc-version
?
Thank you so much
Upvotes: 0
Views: 1723
Reputation: 177
Thanks for reporting and fixing this! Note, that python version was changed from 3.7 to 3.6 in the latest image release according to this.
Upvotes: 0
Reputation: 2695
According to the source code of generate_custom_image.py
47 # Old style images: 1.2.3
48 # New style images: 1.2.3-deb8
49 _VERSION_REGEX = re.compile(r"^\d+\.\d+\.\d+(-.{4})?$")
Only 1.4.0-deb9 can match the regex but 1.4.0-RC10-deb9 will not match.
If you want to use preview release, you need to change the regex in generate_custom_image.py
Update: I have sent pull request to dataproc cumstom image
Upvotes: 3
Reputation: 7058
When using the CLI I get the following error:
ERROR: (gcloud.dataproc.clusters.create) INVALID_ARGUMENT: Failed to resolve image version '1.4'. Accepted image versions: [0.1, 0.2, 1.0, 1.0-deb9, 1.1, 1.1-deb9, 1.2, 1.2-deb9, 1.3, 1.3-deb9, preview]. See https://cloud.google.com/dataproc/docs/concepts/versioning/dataproc-versions for additional information on image versioning.
So, if I use preview
it works and creates one with 1.4.0-RC10-deb9
:
gcloud dataproc clusters create cluster-name --image-version preview
EDIT: regarding custom images
After inspecting what the script does it retrieves theimages using this filter. If we just list all of them we can see the available ones such as:
$ gcloud compute images list --project cloud-dataproc
...
dataproc-1-4-deb9-20190213-000000-rc01 cloud-dataproc READY
One possible way to select that one would be to replace lines 122-123 of generate_custom_image.py
with:
filter_arg = "--filter=name:dataproc-1-4-deb9-20190213-000000-rc01"
and call the script with a dummy version for the regex:
python generate_custom_image.py --dataproc-version 1.2.0 ...
Upvotes: 2