Reputation: 61
Are TPUs supported for distributed hyperparameter search? I'm using the tensor2tensor
library, which supports CloudML for hyperparameter search, i.e., the following works for me to conduct hyperparameter search for a language model on GPUs:
t2t-trainer \
--model=transformer \
--hparams_set=transformer_tpu \
--problem=languagemodel_lm1b8k_packed \
--train_steps=100000 \
--eval_steps=8 \
--data_dir=$DATA_DIR \
--output_dir=$OUT_DIR \
--cloud_mlengine \
--hparams_range=transformer_base_range \
--autotune_objective='metrics-languagemodel_lm1b8k_packed/neg_log_perplexity' \
--autotune_maximize \
--autotune_max_trials=100 \
--autotune_parallel_trials=3
However, when I try to utilize TPUs as in the following:
t2t-trainer \
--problem=languagemodel_lm1b8k_packed \
--model=transformer \
--hparams_set=transformer_tpu \
--data_dir=$DATA_DIR \
--output_dir=$OUT_DIR \
--train_steps=100000 \
--use_tpu=True \
--cloud_mlengine_master_type=cloud_tpu \
--cloud_mlengine \
--hparams_range=transformer_base_range \
--autotune_objective='metrics-languagemodel_lm1b8k_packed/neg_log_perplexity' \
--autotune_maximize \
--autotune_max_trials=100 \
--autotune_parallel_trials=5
I get the error:
googleapiclient.errors.HttpError: <HttpError 400 when requesting https://ml.googleapis.com/v1/projects/******/jobs?alt=json returned "Field: master_type Error: The specified machine type for masteris not supported in TPU training jobs: cloud_tpu"
Upvotes: 0
Views: 196
Reputation: 1501
One of the authors of the tensor2tensor library here. Yup, this was indeed a bug and is now fixed. Thanks for spotting. We'll release a fixed version on PyPI this week, and you can of course clone and install locally from master
until then.
The command you used should work just fine now.
Upvotes: 3
Reputation: 100
I believe there is a bug in the tensor2tensor library: https://github.com/tensorflow/tensor2tensor/blob/6a7ef7f79f56fdcb1b16ae76d7e61cb09033dc4f/tensor2tensor/utils/cloud_mlengine.py#L281
It's the worker_type (and not the master_type) that needs to be set for Cloud ML Engine.
To answer the original question though, yes, HP Tuning should be supported for TPUs, but the error above is orthogonal to that.
Upvotes: 2