Reputation: 121
The error message is this
Traceback (most recent call last):
File "./run_classifier.py", line 914, in <module>
tf.app.run()
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/platform/app.py", line 40, in run
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
File "/usr/local/lib/python3.5/dist-packages/absl/app.py", line 299, in run
_run_main(main, args)
File "/usr/local/lib/python3.5/dist-packages/absl/app.py", line 250, in _run_main
sys.exit(main(argv))
File "./run_classifier.py", line 839, in main
estimator.train(input_fn=train_input_fn, max_steps=num_train_steps)
File "/usr/local/lib/python3.5/dist-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2876, in train
rendezvous.raise_errors()
File "/usr/local/lib/python3.5/dist-packages/tensorflow_estimator/python/estimator/tpu/error_handling.py", line 131, in raise_errors
six.reraise(typ, value, traceback)
File "/usr/local/lib/python3.5/dist-packages/six.py", line 693, in reraise
raise value
File "/usr/local/lib/python3.5/dist-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2871, in train
saving_listeners=saving_listeners)
File "/usr/local/lib/python3.5/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 364, in train
hooks.extend(self._convert_train_steps_to_hooks(steps, max_steps))
File "/usr/local/lib/python3.5/dist-packages/tensorflow_estimator/python/estimator/tpu/tpu_estimator.py", line 2746, in _convert_train_steps_to_hooks
if ctx.is_running_on_cpu():
File "/usr/local/lib/python3.5/dist-packages/tensorflow_estimator/python/estimator/tpu/tpu_context.py", line 442, in is_running_on_cpu
self._validate_tpu_configuration()
File "/usr/local/lib/python3.5/dist-packages/tensorflow_estimator/python/estimator/tpu/tpu_context.py", line 613, in _validate_tpu_configuration
'are {}.'.format(tpu_system_metadata.devices))
RuntimeError: Cannot find any TPU cores in the system. Please double check Tensorflow master address and TPU worker(s). Available devices are (_DeviceAttributes(/job:localhost/replica:0/task:0/device:CPU:0
, CPU, 268435456, 8731946518164767128), _DeviceAttributes(/job:localhost/replica:0/task:0/device:XLA_CPU:0, XLA_CPU, 17179869184, 14576891077092876257)).
I am trying to run the example of the paper(bert-gcn-for-paper-citation)
They open the source code on GitHub(https://github.com/TeamLab/bert-gcn-for-paper-citation) so I followed the readme and run the code on Google Cloud Platform(GCP) and got the error.
Before running the code, I checked the BERT and GAE on the GCP because the paper and code are built on these algorithms. BERT and GAE run successfully. (Followed GCP page about the BERT tutorial https://cloud.google.com/tpu/docs/tutorials/bert)
BERT and BERT-GCN have almost the same flags.
BERT example code
python3 ./bert/run_classifier.py \
--task_name=${TASK_NAME} \
--do_train=true \
--do_eval=true \
--data_dir=$GLUE_DIR/${TASK_NAME} \
--vocab_file=$BERT_BASE_DIR/vocab.txt \
--bert_config_file=$BERT_BASE_DIR/bert_config.json \
--init_checkpoint=$BERT_BASE_DIR/bert_model.ckpt \
--max_seq_length=128 \
--train_batch_size=32 \
--learning_rate=2e-5 \
--num_train_epochs=3.0 \
--output_dir=${STORAGE_BUCKET}/${TASK_NAME}-output/ \
--use_tpu=True \
--tpu_name=$TPU_NAME
and I slightly changed it for BERT-GCN (added last 3 flags)
python3 ./run_classifier.py \
--task_name=${TASK_NAME} \
--do_train=true \
--do_eval=true \
--data_dir=$GLUE_DIR/${TASK_NAME} \
--vocab_file=$BERT_BASE_DIR/vocab.txt \
--bert_config_file=$BERT_BASE_DIR/bert_config.json \
--init_checkpoint=$BERT_BASE_DIR/bert_model.ckpt \
--max_seq_length=128 \
--train_batch_size=32 \
--learning_rate=2e-5 \
--num_train_epochs=3.0 \
--output_dir=${STORAGE_BUCKET}/${TASK_NAME}-output/ \
--use_tpu=True \
--tpu_name=$TPU_NAME \
--model=bert_base \
--dataset=PeerRead \
--frequency=1
but I got the error and I couldn't find the solution I googled it but none of them was my solution to my problem.
Could you help me out?
Upvotes: 2
Views: 971
Reputation: 1824
The issue here is that gcloud doesn't pass the value of the $TPU_NAME environment variable. You need to specify your TPU explicitly;
USE:
--tpu=my-tpu
INSTEAD OF:
--tpu=$TPU_NAME
Upvotes: 1