Maral
Maral

Reputation: 75

loading data in GPU before starting training in Google Colab

I am using a subset of the PlantVillage (image) dataset on my Google drive and trying to train CNN models on that data from Google Colab (and of course, I use GPU). The problem is, the first epoch of training goes very slowly because the data is being loaded into the GPU for the first time. the later rounds move much faster and in a predictable frame of time. Now, is this possible to do the loading prior to the training and excluded from it? I want to %%time my training time and having this extra loading time in my training messes things up.

I use Tensorflow and Keras applications for data preprocessing and model training.

Upvotes: 2

Views: 299

Answers (1)

user11530462
user11530462

Reputation:

You can use Dataset.cache() and Dataset.prefetch() which will keep the data in memory after loading from disk and will increase the model training speed comparatively.

Check the below code:

AUTOTUNE = tf.data.AUTOTUNE

train_ds = train_ds.cache().prefetch(buffer_size=AUTOTUNE)
val_ds = val_ds.cache().prefetch(buffer_size=AUTOTUNE)

Please have a look at this link for your reference.

Upvotes: 1

Related Questions