Reputation: 195
I've tried deploying the marketplace solution Deep Learning VM (Google Click to Deploy) using TF2.0 with a GPU. I'm doing this through the UI to select the zone and other instance options.
As soon as I deploy however and get taken to the Deployment Manager screen, I see the following error:
jupyterlab-eu-w-4c-vm: {"ResourceType":"compute.v1.instance","ResourceErrorCode":"400","ResourceErrorMessage":{"code":400,"errors":[{"domain":"global","message":"Invalid value for field 'resource.disks[0].initializeParams.sourceImage': 'https://www.googleapis.com/compute/v1/projects/click-to-deploy-images/global/images/tf-2-0-cu100-experimental-20190821'. The referenced image resource cannot be found.","reason":"invalid"}],"message":"Invalid value for field 'resource.disks[0].initializeParams.sourceImage': 'https://www.googleapis.com/compute/v1/projects/click-to-deploy-images/global/images/tf-2-0-cu100-experimental-20190821'. The referenced image resource cannot be found.","statusMessage":"Bad Request","requestPath":"https://compute.googleapis.com/compute/v1/projects/jupyterlab-instance/zones/europe-west4-c/instances","httpMethod":"POST"}}
The key being that the image resource cannot be found at that url:
https://www.googleapis.com/compute/v1/projects/click-to-deploy-images/global/images/tf-2-0-cu100-experimental-20190821
I searched for the available images on the cloud shell:
@cloudshell:~ (jupyterlab-instance)$ gcloud compute images list --project click-to-deploy-images --no-standard-images --uri | grep tf-2-0-cu100
https://www.googleapis.com/compute/v1/projects/click-to-deploy-images/global/images/tf-2-0-cu100--experimental-20190821
Notice, the URL is different, there is an extra "-" in the image name as compared to what the deployment script is trying to fetch:
tf-2-0-cu100-experimental-20190821
tf-2-0-cu100--experimental-20190821
This looks like an unintentional typo.
My question though, is how can I go about deploying this VM? Is there a way I can modify the deployment script that the UI generates before deploying or do I need to do the whole deployment via the CLI to add in the extra "-"?
Is there a way I can raise this to get someone to fix the typo? I presume this would be preventing anyone trying to deploy a TensorFlow 2 GPU instance via the UI tools using the Deep Learning VM.
Thanks for your help.
Upvotes: 2
Views: 759
Reputation: 438
I encountered the same problem. that VM will not deploy with TF 2.0 version because the boot image URL is messed up it seems. it's not related to zone (I've tried deploying without GPUs and in different zones, it won't work)
One solution is to deploy the image with an instance directly (See documentation 1)
gcloud compute instances create $INSTANCE_NAME \
--zone=$ZONE \
--image-family=tf2-latest-gpu \(I used cpu the one but this one seems to fit)
--image-project=deeplearning-platform-release \
--accelerator=count=1,type=nvidia-tesla-k80
Add any options desired (GPU,etc...).
You can get help for the command with
gcloud compute instances create --help
To list all available images, use
gcloud compute images list --project deeplearning-platform-release --no-standard-images
Upvotes: 1
Reputation: 649
I had a very similar problem and it turned out that I was trying to deploy a GPU model in a zone in which it wasn't supported. Take a look here to see whether the GPU type you're using is supported in "europe-west4-c". For example, if you're using a K80, then it's not available in that zone (see screenshot below).
Upvotes: 0