Reputation: 11
I was following the tutorial on https://cloud.google.com/tpu/docs/how-to.
I created a TPU instance, and tried to connect to it with gcloud compute ssh
line. Then, this error occurred.
AppData\Local\Google\Cloud SDK>gcloud compute ssh node-1 --zone=asia-east1-c
PythonERROR: (gcloud.compute.ssh) Could not fetch resource:
- The resource 'projects/project-masker/zones/asia-east1-c/instances/node-1' was not found
Trying to solve this error, I found out that the tpus were not included in the execution group.
AppData\Local\Google\Cloud SDK>gcloud compute tpus list
PythonNAME ZONE ACCELERATOR_TYPE NETWORK RANGE STATUS
node-2 asia-east1-c v2-8 default 10.75.202.248/29 READY
node-1 asia-east1-c v2-8 default 10.82.81.168/29 READY
AppData\Local\Google\Cloud SDK>gcloud compute tpus execution-groups list
PythonListed 0 items.
This is what I got when I tried to restart the tpu.
PythonRequest issued for: [node-1]
Waiting for operation [projects/project-masker/locations/asia-east1-c/operations/operation-1625299249870-5c633787137b9-
e14800b7-d997be6b] to complete...done.
done: true
metadata:
'@type': type.googleapis.com/google.cloud.common.OperationMetadata
apiVersion: v1
cancelRequested: false
createTime: '2021-07-03T08:00:49.884674545Z'
endTime: '2021-07-03T08:01:31.161199334Z'
target: projects/project-masker/locations/asia-east1-c/nodes/node-1
verb: update
name: projects/project-masker/locations/asia-east1-c/operations/operation-1625299249870-5c633787137b9-e14800b7-d997be6b
response:
'@type': type.googleapis.com/google.cloud.tpu.v1.Node
acceleratorType: v2-8
apiVersion: V1
cidrBlock: 10.82.81.168/29
createTime: '2021-07-03T07:27:41.148997156Z'
health: HEALTHY
ipAddress: 10.82.81.170
name: projects/project-masker/locations/asia-east1-c/nodes/node-1
network: global/networks/default
networkEndpoints:
- ipAddress: 10.82.81.170
port: 8470
port: '8470'
schedulingConfig: {}
serviceAccount: [email protected]
state: READY
tensorflowVersion: pytorch-1.9
I tried to find some related articles on google, but I couldn't find any. How can I fix this?
Upvotes: 1
Views: 1859
Reputation: 301
You can't SSH to a TPU node directly, so gcloud compute ssh {tpu_name}
isn't expected to work.
You can, however, SSH directly to a TPU VM, please see this link. If you are already using TPU VM, then your issue is that you're trying
gcloud compute ssh
rather than
gcloud alpha compute tpus tpu-vm ssh ...
Upvotes: 3