Reputation: 313
I wanna make a model with multiple inputs. So, I try to build a model like this.
# define two sets of inputs
inputA = Input(shape=(32,64,1))
inputB = Input(shape=(32,1024))
# CNN
x = layers.Conv2D(32, kernel_size = (3, 3), activation = 'relu')(inputA)
x = layers.Conv2D(32, (3,3), activation='relu')(x)
x = layers.MaxPooling2D(pool_size=(2,2))(x)
x = layers.Dropout(0.2)(x)
x = layers.Flatten()(x)
x = layers.Dense(500, activation = 'relu')(x)
x = layers.Dropout(0.5)(x)
x = layers.Dense(500, activation='relu')(x)
x = Model(inputs=inputA, outputs=x)
# DNN
y = layers.Flatten()(inputB)
y = Dense(64, activation="relu")(y)
y = Dense(250, activation="relu")(y)
y = Dense(500, activation="relu")(y)
y = Model(inputs=inputB, outputs=y)
# Combine the output of the two models
combined = concatenate([x.output, y.output])
# combined outputs
z = Dense(300, activation="relu")(combined)
z = Dense(100, activation="relu")(combined)
z = Dense(1, activation="softmax")(combined)
model = Model(inputs=[x.input, y.input], outputs=z)
model.summary()
opt = Adam(lr=1e-3, decay=1e-3 / 200)
model.compile(loss = 'sparse_categorical_crossentropy', optimizer = opt,
metrics = ['accuracy'])
and the summary : _
But, when i try to train this model,
history = model.fit([trainimage, train_product_embd],train_label,
validation_data=([validimage,valid_product_embd],valid_label), epochs=10,
steps_per_epoch=100, validation_steps=10)
the problem happens.... :
ResourceExhaustedError Traceback (most recent call
last) <ipython-input-18-2b79f16d63c0> in <module>()
----> 1 history = model.fit([trainimage, train_product_embd],train_label,
validation_data=([validimage,valid_product_embd],valid_label),
epochs=10, steps_per_epoch=100, validation_steps=10)
4 frames
/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/client/session.py
in __call__(self, *args, **kwargs) 1470 ret =
tf_session.TF_SessionRunCallable(self._session._session, 1471
self._handle, args,
-> 1472 run_metadata_ptr) 1473 if run_metadata: 1474
proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)
ResourceExhaustedError: 2 root error(s) found. (0) Resource
exhausted: OOM when allocating tensor with shape[800000,32,30,62] and
type float on /job:localhost/replica:0/task:0/device:GPU:0 by
allocator GPU_0_bfc [[{{node conv2d_1/convolution}}]] Hint: If you
want to see a list of allocated tensors when OOM happens, add
report_tensor_allocations_upon_oom to RunOptions for current
allocation info.
[[metrics/acc/Mean_1/_185]] Hint: If you want to see a list of
allocated tensors when OOM happens, add
report_tensor_allocations_upon_oom to RunOptions for current
allocation info.
(1) Resource exhausted: OOM when allocating tensor with
shape[800000,32,30,62] and type float on
/job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[{{node conv2d_1/convolution}}]] Hint: If you want to see a list of
allocated tensors when OOM happens, add
report_tensor_allocations_upon_oom to RunOptions for current
allocation info.
0 successful operations. 0 derived errors ignored.
Thanks for reading and hopefully helping me :)
Upvotes: 21
Views: 67638
Reputation: 5879
In my case, a stale process was using most of the RAM available on the GPU.
Simple solution: (for Nvidia GPUs)
nvidia-smi
on the terminal and note the PID of the process(es) with large memory usagesudo kill PID
, with the PID from aboveUpvotes: 0
Reputation: 1
Solutions :
Upvotes: 0
Reputation: 16
I think the most common reason for this case to arise would be the absence of MaxPooling layers. Use the same architecture, but add atleast one MaxPool layer after Conv2D layers. This might even improve the overall performance of the model. You can even try reducing the depth of the model, i.e., remove the unnecessary layers and optimize.
Upvotes: 0
Reputation: 36594
OOM stands for "out of memory". Your GPU is running out of memory, so it can't allocate memory for this tensor. There are a few things you can do:
Dense
, Conv2D
layersbatch_size
(or increase steps_per_epoch
and validation_steps
)tf.image.rgb_to_grayscale
)MaxPooling2D
layers after convolutional layerstf.image.resize
for that)float
precision for your input, namely np.float32
There is more useful information about this error:
OOM when allocating tensor with shape[800000,32,30,62]
This is a weird shape. If you're working with images, you should normally have 3 or 1 channel. On top of that, it seems like you are passing your entire dataset at once; you should instead pass it in batches.
Upvotes: 60
Reputation: 53
Happened to me as well.
You can try reducing trainable parameters by using some form of Transfer Learning - try freezing the initial few layers and use lower batch sizes.
Upvotes: 0
Reputation: 2430
From [800000,32,30,62]
it seems your model put all the data in one batch.
Try specified batch size like
history = model.fit([trainimage, train_product_embd],train_label, validation_data=([validimage,valid_product_embd],valid_label), epochs=10, steps_per_epoch=100, validation_steps=10, batch_size=32)
If it still OOM then try reduce the batch_size
Upvotes: 2