Reputation: 21
I'm trying to use my (pre-emptible) Cloud TPU v3-256 on my Google Cloud Compute Engine VM with TensorFlow 2.1, but it doesn't seem to be working as the TPUClusterResolver
throws a Could not lookup TPU metadata
error.
Using individual (non-preemptible) TPUs works fine as long as I use the grpc://
address rather than the TPU Name. However, neither individual TPUs nor my TPU Pod work when using the TPU Name, and throw this error.
Can someone help me fix this issue?
Code:
resolver = tf.distribute.cluster_resolver.TPUClusterResolver(tpu='my-tpu-name', zone='europe-west4-a', project='my-project') # The zone, project and TPU Name are correct
Output:
ValueError: Could not lookup TPU metadata from name 'my-tpu-name'. Please double
check the tpu argument in the TPUClusterResolver constructor.
Exception: Failed to retrieve http://metadata.google.internal/computeMetadata/v1/
instance/service-accounts/default/?recursive=True
from the Google Compute Enginemetadata service. Response: {'metadata-flavor': 'Google',
'date': 'Thu, 28 May 2020 17:42:35 GMT', 'content-type': 'text/html; charset=UTF-8',
'server': 'Metadata Server for VM', 'content-length': '1629', 'x-xss-protection': '0', 'x
frame-options': 'SAMEORIGIN', 'status': '404'}
Upvotes: 2
Views: 1615
Reputation: 486
I suspect it could be a mismatch in either one of the following: Tensorflow version, zone or project between compute VM and TPU.
If you create both TPU and GCE VM with the same Tensorflow version (2.1 or 2.2) and they both are created in the same project and zone. You can just provide the TPU name in TPUClusterResolver
and it should work fine:
resolver = tf.distribute.cluster_resolver.TPUClusterResolver(tpu='my-tpu-name')
You can omit TPU name if you set TPU_NAME
environment variable (export TPU_NAME=my-tpu-name
) on your VM.
Upvotes: 3