Is there any way to use both a GPU accelerator and Torch in google cloud AI platform for model deployment?

Question

I already have a torch model (BERT), and I'd like to use the ai-platform service to get online predictions using a GPU, but I can't figure out how to do it.

The following command, without an accelerator, works:

gcloud alpha ai-platform versions create {VERSION} --model {MODEL_NAME} --origin=gs://{BUCKET}/models/ --python-version=3.5 --runtime-version=1.14 --package-uris=gs://{BUCKET}/packages/my-torch-package-0.1.tar.gz,gs://cloud-ai-pytorch/torch-1.0.0-cp35-cp35m-linux_x86_64.whl --machine-type=mls1-c4-m4 --prediction-class=predictor.CustomModelPrediction

However, if I try to add the accelerator parameter:

--accelerator=^:^count=1:type=nvidia-tesla-k80

I get the following error message:

ERROR: (gcloud.alpha.ai-platform.versions.create) INVALID_ARGUMENT: Field: version.machine_type Error: GPU accelerators are not supported on the requested machine type: mls1-c4-m4
- '@type': type.googleapis.com/google.rpc.BadRequest
  fieldViolations:
  - description: 'GPU accelerators are not supported on the requested machine type:
      mls1-c4-m4'
    field: version.machine_type

But if I use a different machine type, that I know I can use with an accelerator, I get the following error:

ERROR: (gcloud.alpha.ai-platform.versions.create) FAILED_PRECONDITION: Field: framework Error: Machine type n1-highcpu-4 does not support CUSTOM_CLASS.
- '@type': type.googleapis.com/google.rpc.BadRequest
  fieldViolations:
  - description: Machine type n1-highcpu-4 does not support CUSTOM_CLASS.
    field: framework

It's like any machine that supports GPU accelerators doesn't support custom classes (required AFAIK to use Torch), and any machine that supports custom classes doesn't support GPU accelerators.

Any way to make it work?

There are a bunch of tutorials on how to use ai-platform with Torch, but I can't see the point of using gcloud to train and predict if you have to do everything on the CPU so that feels very odd to me.

Albert Albesa · Accepted Answer

As for now, using Custom Prediction Routines is in Beta. In addition, using other machine types than mls1-c1-m2 is also in Beta.

Nevertheless, as you can see in the previously referenced link, GPU's are not available for mls1-like machines. At the same time, these are the only machine types that allow models outside TensorFlow.

In summary, probably deploying your prediction model in Torch and using a GPU might not be a feasible option right now.

Is there any way to use both a GPU accelerator and Torch in google cloud AI platform for model deployment?

Answers (2)

Related Questions