Reputation: 159
I am trying to manage a large image dataset, that does not fit in the memory, while requiring some specific calculation. Currently, my code looks like this:
files = [str(f) for f in self.files]
labels = self.categories
batch_size= 32
dataset = tf.data.Dataset.from_generator(
lambda: zip(files, labels),
output_types=(tf.string, tf.uint8),
output_shapes=(tf.TensorShape([]), tf.TensorShape([]))
)
dataset = dataset.map(
lambda x, y: tf.py_function(_parser, [x, y, category_count], [tf.float32, tf.uint8]),
num_parallel_calls=tf.data.experimental.AUTOTUNE,
deterministic=False)
dataset.cache(filename='/tmp/dataset.tmp')
if mode == tf.estimator.ModeKeys.TRAIN:
dataset = dataset.shuffle(buffer_size=10*batch_size, reshuffle_each_iteration=True)
dataset = dataset.batch(batch_size=batch_size, drop_remainder=False)
if mode == tf.estimator.ModeKeys.TRAIN:
dataset.repeat(None)
else:
dataset.repeat(1)
dataset = dataset.prefetch(tf.data.experimental.AUTOTUNE)
The _parser()
function opens a image file, does a bunch of transformations, and returns a tensor and a 1-hot encoded vector. The caching step does not seem to work properly, however:
Does the cache()
function creates a file only when both the memory and the swap partition is full? Furthermore, I expect to read only batch_size
files at a time. However, it seems that all files are read at once during the mapping step. Should I consider using interleave()
combined with from_generator()
instead? Or maybe should I batched the files first, then map them?
Upvotes: 1
Views: 1492
Reputation: 3354
In general, it is not true that lots of RAM is needed to cache.
As opposed to other libaries (like gensim
or hugging-face
), Tensorflow's basic caching is simplistic, by the design choice. As of now (checked under version 2.12) it uses RAM excessively and doesn't handle garbage collecting well. This snippet demonstrates that caching consumes RAM linearly with the data size and doesn't freeze resources in the second epoch:
import numpy as np
import tensorflow as tf
import psutil
IMG_SHAPE = (224,224,3)
def gen_img(shape=IMG_SHAPE):
while True:
img = np.random.randint(0,256,size=IMG_SHAPE)
lab = np.random.randint(0,10)
yield (img,lab)
ds = tf.data.Dataset.from_generator(
gen_img,
output_signature=(
tf.TensorSpec(shape=IMG_SHAPE, dtype=tf.int32),
tf.TensorSpec(shape=(), dtype=tf.int32)
)
)
# !rm ./my_cached_dataset*
ds = ds.take(int(1e4)).cache('./my_cached_dataset').repeat(2)
for i,(img,lab) in enumerate(ds):
if i%1000==0:
print(psutil.virtual_memory())
This gives the following results on Google Colab
svmem(total=13613314048, available=11903979520, percent=12.6, used=1375707136, free=9484337152, active=541478912, inactive=3288244224, buffers=42360832, cached=2710908928, shared=2764800, slab=214601728)
svmem(total=13613314048, available=11349929984, percent=16.6, used=1927380992, free=9098743808, active=538705920, inactive=3477790720, buffers=42557440, cached=2544631808, shared=2772992, slab=236298240)
svmem(total=13613314048, available=11246673920, percent=17.4, used=2030444544, free=8296189952, active=539701248, inactive=4435984384, buffers=43372544, cached=3243307008, shared=2772992, slab=266022912)
svmem(total=13613314048, available=10702491648, percent=21.4, used=2574770176, free=6455230464, active=543043584, inactive=6231724032, buffers=43421696, cached=4539891712, shared=2772992, slab=300105728)
svmem(total=13613314048, available=10379468800, percent=23.8, used=2897776640, free=4922003456, active=543133696, inactive=7728226304, buffers=43446272, cached=5750087680, shared=2772992, slab=334139392)
svmem(total=13613314048, available=10069651456, percent=26.0, used=3207753728, free=3356516352, active=543207424, inactive=9257857024, buffers=43511808, cached=7005532160, shared=2772992, slab=369360896)
svmem(total=13613314048, available=9731747840, percent=28.5, used=3545501696, free=1802670080, active=543256576, inactive=10778898432, buffers=43560960, cached=8221581312, shared=2772992, slab=403521536)
svmem(total=13613314048, available=9435697152, percent=30.7, used=3841613824, free=266637312, active=543305728, inactive=12278542336, buffers=43610112, cached=9461452800, shared=2772992, slab=438865920)
svmem(total=13613314048, available=9271164928, percent=31.9, used=4006137856, free=193994752, active=543870976, inactive=12340707328, buffers=43122688, cached=9370058752, shared=2772992, slab=440442880)
svmem(total=13613314048, available=8968581120, percent=34.1, used=4308578304, free=169992192, active=543911936, inactive=12344811520, buffers=42754048, cached=9091989504, shared=2772992, slab=435945472)
svmem(total=13613314048, available=8662331392, percent=36.4, used=4615012352, free=169848832, active=543952896, inactive=12350521344, buffers=42803200, cached=8785649664, shared=2772992, slab=428064768)
svmem(total=13613314048, available=9466744832, percent=30.5, used=3810525184, free=163422208, active=543965184, inactive=12362956800, buffers=42827776, cached=9596538880, shared=2772992, slab=416862208)
svmem(total=13613314048, available=9460772864, percent=30.5, used=3816542208, free=155451392, active=543985664, inactive=12395225088, buffers=42835968, cached=9598484480, shared=2772992, slab=382918656)
svmem(total=13613314048, available=9467899904, percent=30.5, used=3809370112, free=160645120, active=543797248, inactive=12427423744, buffers=42835968, cached=9600462848, shared=2772992, slab=349220864)
svmem(total=13613314048, available=9470406656, percent=30.4, used=3806834688, free=161198080, active=543805440, inactive=12460277760, buffers=42835968, cached=9602445312, shared=2772992, slab=315473920)
svmem(total=13613314048, available=9479843840, percent=30.4, used=3797512192, free=161202176, active=543797248, inactive=12491632640, buffers=42835968, cached=9611763712, shared=2772992, slab=291315712)
svmem(total=13613314048, available=9487978496, percent=30.3, used=3789242368, free=166912000, active=543797248, inactive=12523065344, buffers=42835968, cached=9614323712, shared=2772992, slab=262230016)
svmem(total=13613314048, available=9505796096, percent=30.2, used=3771478016, free=183492608, active=543797248, inactive=12555304960, buffers=42835968, cached=9615507456, shared=2772992, slab=229867520)
svmem(total=13613314048, available=9550958592, percent=29.8, used=3728154624, free=183087104, active=543797248, inactive=12567662592, buffers=42835968, cached=9659236352, shared=2772992, slab=215998464)
svmem(total=13613314048, available=9558626304, percent=29.8, used=3720568832, free=190627840, active=543797248, inactive=12567326720, buffers=42835968, cached=9659281408, shared=2772992, slab=215982080)
In an attempt to make up for these shortcomings, Tensorflow has released a more advanced cache-like operation called snapshort
. But it is, of now (July'23), experimental and poorly documented.
Upvotes: 0
Reputation:
Note that cache() should be used when the dataset is small. If the dataset is large(which is in your case) RAM will not be sufficient to cache its content so it does not fit into memory. Either you should increase the capacity of RAM or adapt some other method to speed up the training.
The other reason for the slowdown of training is the preprocessing stage when you use map()
function.
map()
method applies a transformation to each item unlike apply()
method applies a transformation to the dataset as a whole.
You can use the interleave()
and retain the same order of map()
and then batch()
.
You are already using threading by making num_parallel_calls
and setting it to tf.data.experimental.AUTOTUNE
makes the best use of whatever is available.
You can also normalize your input data and then cache
, if it does not fit into memory again then it's better not to cache
on a large dataset.
You can follow these performance tips from TensorFlow.
If you have multiple workers/devices it will help you to speed up the training.
Below is the sample illustration showing prefetching with multithreaded loading and preprocessing.
Upvotes: 3