Vineeth Narayanan
Vineeth Narayanan

Reputation: 21

Error with TPUClusterResolver for Cloud TPU v3 Pod with TensorFlow 2.1

I'm trying to use my (pre-emptible) Cloud TPU v3-256 on my Google Cloud Compute Engine VM with TensorFlow 2.1, but it doesn't seem to be working as the TPUClusterResolver throws a Could not lookup TPU metadata error.

Using individual (non-preemptible) TPUs works fine as long as I use the grpc:// address rather than the TPU Name. However, neither individual TPUs nor my TPU Pod work when using the TPU Name, and throw this error.

Can someone help me fix this issue?

Code:

resolver = tf.distribute.cluster_resolver.TPUClusterResolver(tpu='my-tpu-name', zone='europe-west4-a', project='my-project')  # The zone, project and TPU Name are correct

Output:

ValueError: Could not lookup TPU metadata from name 'my-tpu-name'. Please double
check the tpu argument in the TPUClusterResolver constructor.
Exception: Failed to retrieve http://metadata.google.internal/computeMetadata/v1/
instance/service-accounts/default/?recursive=True
from the Google Compute Enginemetadata service. Response: {'metadata-flavor': 'Google', 
'date': 'Thu, 28 May 2020 17:42:35 GMT', 'content-type': 'text/html; charset=UTF-8',
'server': 'Metadata Server for VM', 'content-length': '1629', 'x-xss-protection': '0', 'x
frame-options': 'SAMEORIGIN', 'status': '404'}

Upvotes: 2

Views: 1615

Answers (1)

Gagik
Gagik

Reputation: 486

I suspect it could be a mismatch in either one of the following: Tensorflow version, zone or project between compute VM and TPU. If you create both TPU and GCE VM with the same Tensorflow version (2.1 or 2.2) and they both are created in the same project and zone. You can just provide the TPU name in TPUClusterResolver and it should work fine:

resolver = tf.distribute.cluster_resolver.TPUClusterResolver(tpu='my-tpu-name') 

You can omit TPU name if you set TPU_NAME environment variable (export TPU_NAME=my-tpu-name) on your VM.

Upvotes: 3

Related Questions