Reputation: 109
I am trying to save my model by using tf.keras.callbacks.ModelCheckpoint
with filepath
as some folder in drive, but I am getting this error:
File system scheme '[local]' not implemented (file: './ckpt/tensorflow/training_20220111-093004_temp/part-00000-of-00001')
Encountered when executing an operation using EagerExecutor. This error cancels all future operations and poisons their output tensors.
Does anybody know what is the reason for this and the workaround for this?
Upvotes: 1
Views: 449
Reputation: 514
Looks to me that you are trying to access the file system of your host VM from the TPU which is not directly possible.
When using the TPU and you want to access files in e.g. GoogleColab you should place it within:
with tf.device('/job:localhost'):
<YOUR_CODE>
Now to your problem: The local host acts as parameter server when training on TPU. So if you want to checkpoint your training, the localhost must do so. When you check the documention for said callback, you cann find the parameter options.
checkpoint_options = tf.train.CheckpointOptions(experimental_io_device='/job:localhost')
checkpoint = tf.keras.callbacks.ModelCheckpoint(<YOUR_PATH>, options = checkpoint_options)
Hope this solves your issue!
Best, Sascha
Upvotes: 3