Reputation: 536
I am trying to use tf.Dataset.cache but it seems to have no affect.
I have 3 questions please:
At what point would you want to cache your dataset ? I assume it will be before any mapping action that has random behavior. Is it recommended to cache the dataset after inital parsing from a TFRecord file before any other mapping ?
How can I measure the speed-optimization affect of caching ?
I would assume I will always want to cache my images to the memory. At least some portion of it and have the pipeline feed the network faster. When will I want to cache to a file ?
Thanks!
Upvotes: 5
Views: 1846
Reputation: 69
The intention of .cache function is to speed up your data pipeline by cache your samples into memory/disk space. Therefore, for all epochs after initial epoch, your pipeline will no longer need to read/parse/process. So with that being said, it is usually the best to put it at the end of your data pipeline.
You can time your first epoch and your second epoch, and see if there's speed increase.
When your images are too big to fit into memory. But disk I/O takes time too. You'll need to make sure your pipeline processing is taking way longer than that for it to be beneficial.
Upvotes: 1