Rick Blaine
Rick Blaine

Reputation: 175

Should a TFRecord contain multiple observations or one?

I see in explanation a TFRecord contains multiple classes and multiple images (a cat and a bridge). When it was written, both images are written into one TFRecord. During the read back, it is verified that this TFRecord contains two images.

Elsewhere I have seen people generating one TFRecord per image, I know you can load multiple TFRecord files like this:

train_dataset = tf.data.TFRecordDataset("<Path>/*.tfrecord")

But which way is recommended? should I build one tfrecord per image, or one tfrecord for multiple images? If put multiple images into one tfrecord, then how many is maximum?

Upvotes: 2

Views: 568

Answers (1)

prouast
prouast

Reputation: 1196

As you said, it is possible to save an arbitrary amount of entries in a single TFRecord file, and one can create as many TFRecord files as desired.

I would recommend using practical considerations to decide how to proceed:

  • On one hand, try to use fewer TFRecord files for easier handling moving files in the filesystem
  • On the other hand, avoid growing TFRecord files to a size that can become a problem for filesystem
  • Keep in mind that it is useful to keep separate TFRecord files for train / validation / test split
  • Sometimes the nature of the dataset makes it obvious how to split into separate files (for example, I have a video dataset where I use one TFRecord file per participant session)

Upvotes: 4

Related Questions