Reputation: 3210
I tring to iterate through diffrent hyperparameters to build an optimal model. But after 1 iteration(training of 1 model) is compeleted I'm running out of memory when the 2nd iteration starts.ResourceExhaustedError: OOM when allocating tensor with shape[5877,200,200,3] and type double on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc [Op:GatherV2]
I tried using ops.reset_default_graph()
but it doen't do anything.
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow.keras import regularizers
from tensorflow.keras.layers import Dense,Activation,Flatten,Conv2D,MaxPooling2D,Dropout
import os
import cv2
import random
import pickle
import time
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.callbacks import TensorBoard
from google.colab import files
from tensorflow.python.framework import ops
p1=open("/content/tfds.pickle","rb")
def prepare_ds():
dir="drive//My Drive//dataset//"
cat=os.listdir(dir)
i=1
td=[]
for x in cat:
d=dir+x
y1=cat.index(x)
for img in os.listdir(d):
im=cv2.imread(d+"//"+img)
print(i)
i=i+1
im=cv2.resize(im,(200,200))
td.append([im,y1])
## im[:,:,0],im[:,:,2]=im[:,:,2],im[:,:,0].copy()
## plt.imshow(im)
## plt.show()
random.shuffle(td)
X=[]
Y=[]
for a1,a2 in td:
X.append(a1)
Y.append(a2)
X=np.array(X).reshape(-1,200,200,3)
Y=np.array(Y).reshape(-1,1)
pickle.dump([X,Y],p1)
##prepare_ds()
X,Y=pickle.load(p1)
X=X/255.0
def learn():
model=tf.keras.models.Sequential()
model.add(Conv2D(lsi,(3,3),input_shape=X.shape[1:]))
model.add(Activation("relu"))
model.add(MaxPooling2D(pool_size=(2,2)))
for l in range(cli-1):
model.add(Conv2D(lsi,(3,3)))
model.add(Activation("relu"))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Flatten())
for l in range(dli):
model.add(Dense(lsi))
model.add(Activation("relu"))
model.add(Dropout(0.5))
model.add(Dense(10))
model.add(Activation('softmax'))
model.compile(loss="sparse_categorical_crossentropy",optimizer="adam",metrics=['accuracy'])
model.fit(X,Y,batch_size=16,validation_split=0.1,epochs=3,verbose=2,callbacks=[tb])
model.save('tm1.h5')
ops.reset_default_graph()
dl=[0,1,2]
ls=[32,64,128]
cl=[1,2,3]
for dli in dl:
for lsi in ls:
for cli in cl:
ops.reset_default_graph()
NAME = "{}-conv-{}-nodes-{}-dense".format(cli, lsi, dli)
tb=TensorBoard(log_dir="logs//{}".format(NAME))
print(NAME)
learn()
p1.close()
!zip -r /content/file.zip /content/logs
!cp file.zip "/content/drive/My Drive/"
Upvotes: 5
Views: 33183
Reputation: 1
Reduce batch_size works for me. No delete variable,empty_cache or kill process works. Google Colab replace old var with new var so just reduce batch_size, load data again and run training (no need to restart runtime).
Upvotes: 0
Reputation: 141
If your data is taking a significant amount of memory, you can conserve memory by using the tf.Dataset structures instead of lists. There are keras helper methods like tf.keras.utils.image_dataset_from_directory which will lazily load batches of the images as needed.
You can read more about it from the keras data loading web page
Upvotes: 0
Reputation: 918
This might be a late answer to the question but hopefully someone could find it useful.
A work around to free some memory in google colab can be done by deleting variables that are not needed any more.
Click on the Variables inspector window on the left side.
See what variables you do not need and just delete them.
Upvotes: 5
Reputation: 586
Hi there.
You can use the built-in Garbage Collector library in Python. I often create a custom callback that uses this library on the end of each epoch. You can think of it as clearing cached information you no longer need
# Garbage Collector - use it like gc.collect()
import gc
# Custom Callback To Include in Callbacks List At Training Time
class GarbageCollectorCallback(tf.keras.callbacks.Callback):
def on_epoch_end(self, epoch, logs=None):
gc.collect()
Additionally just try running the command gc.collect()
by itself to see the results and see how it works. Here is some documentation on how it works. I often use it to keep my kernel sizes small in kernel only Kaggle competitions**
I hope this helps!
Upvotes: 19