Reputation: 1838

How can I Transfer Learn using TPU on Colab

I am trying to teach myself Transfer Learning techniques using Tensorflow 2 on Colab.

Using GPU is working fine but as everybody knows Google has its TPUs and they are faster than GPUs.

In Colab, when I switch "Type" from GPU to "TPU, I am adding --use_tpu=true to the command below

python /content/models/research/object_detection/model_main_tf2.py \
--pipeline_config_path={pipeline_fname} \
--model_dir={model_dir} \
--checkpoint_dir={model_dir} \
--eval_timeout=60 \
--use_tpu=true

This script is found in the Models repo.

git clone --quiet https://github.com/tensorflow/models.git

However, it is not working and a few minutes later, I get the following error message:

tensorflow.python.framework.errors_impl.UnimplementedError: File system scheme '[local]' **not implemented (file: '/content/driving-object-detection/training/train')**
    Encountered when executing an operation using EagerExecutor. This error cancels all future operations and poisons their output tensors.
Error in atexit._run_exitfuncs:
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/distribute/tpu_strategy.py", line 540, in async_wait
    context.async_wait()
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/context.py", line 2319, in async_wait
    context().sync_executors()
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/context.py", line 658, in sync_executors
    pywrap_tfe.TFE_ContextSyncExecutors(self._context_handle)
tensorflow.python.framework.errors_impl.UnimplementedError: File system scheme '[local]' not implemented (file: '/content/driving-object-detection/training/train')
    Encountered when executing an operation using EagerExecutor. This error cancels all future operations and poisons their output tensors.
2020-10-23 15:53:03.698253: W ./tensorflow/core/distributed_runtime/eager/destroy_tensor_handle_node.h:57] Ignoring an error encountered when deleting remote tensors handles: Invalid argument: Unable to find the relevant tensor remote_handle: Op ID: 16039, Output num: 1
Additional GRPC error information from remote target /job:worker/replica:0/task:0:
:{"created":"@1603468383.693602692","description":"Error received from peer ipv4:10.72.50.114:8470","file":"external/com_github_grpc_grpc/src/core/lib/surface/call.cc","file_line":1056,"grpc_message":"Unable to find the relevant tensor remote_handle: Op ID: 16039, Output num: 1","grpc_status":3}

Which additional steps Am I supposed to take to prepare the files for TPU? As I mentioned before the dataset, folders structure, I followed the tensorflow.org and working well with GPU.

It is not just as simple as adding the "--use_tpu=true". Is there a step by step guide or can anyone shed some light?

Upvotes: 0