galah92
galah92

Reputation: 3991

Colab: cache dataset on TPU

I'd like to setup something similar to the following colab notebook. I've got a 100MB single TFRecord and I'd like to train it using TPU.

My training input function is the following:

def train_input_fn(batch_size=1024):
  dataset = tf.data.TFRecordDataset(TRAIN_RECORD)
  dataset = dataset.cache()
  dataset = dataset.repeat()
  dataset = dataset.shuffle(100)
  dataset = dataset.map(parse_fn)
  dataset = dataset.batch(batch_size, drop_remainder=True)
  return dataset

From my understanding, when using the TPU the dataset cannot reside on the machine hard disk, that's why I added dataset.cache(). But I'm still getting

UnimplementedError (see above for traceback): File system scheme '[local]' not implemented (file: 'train.tfrecord')

Upvotes: 0

Views: 1321

Answers (2)

Chris Fregly
Chris Fregly

Reputation: 1530

TPUs require google cloud storage. local storage is not supported.

https://cloud.google.com/tpu/docs/troubleshooting#cannot_use_local_filesystem

Upvotes: 2

Mikhail Berlinkov
Mikhail Berlinkov

Reputation: 1624

It looks like the error appears on this row dataset = tf.data.TFRecordDataset(TRAIN_RECORD) which works with local fs. I think you should load data outside your training function as it's done in the notebook.

Upvotes: 0

Related Questions