Ramna
Ramna

Reputation: 29

Tensorflow GPU Resource exhausted error when calling model.evaluate() works well for model.fit()

I am running a Mobilenet model on X-ray images on tensorflow GPU. I am able to fit the model without any errors (using batch size=1). However when I try to call model.evaluate it gives me "Resource Exhausted error"

Here is the model with input shape (224,224,3)

from tensorflow.keras.applications.mobilenet import MobileNet
from tensorflow.keras.layers import Concatenate, UpSampling2D, Conv2D, Reshape
from tensorflow.keras.models import Model

def create_model(trainable=True):
    
    model = MobileNet(input_shape=(IMAGE_HEIGHT, IMAGE_WIDTH, 3), include_top=False, alpha=ALPHA, weights='imagenet')
    
    for layer in model.layers:
        layer.trainable = trainable
        
    block1 = model.get_layer("conv_pw_1_relu").output
    block2 = model.get_layer("conv_pw_3_relu").output
    block3 = model.get_layer("conv_pw_5_relu").output
    block4 = model.get_layer("conv_pw_11_relu").output
    block5 = model.get_layer("conv_pw_13_relu").output
    
    x = Concatenate()([UpSampling2D()(block5), block4])
    x = Concatenate()([UpSampling2D()(x), block3])
    x = Concatenate()([UpSampling2D()(x), block2])
    x = Concatenate()([UpSampling2D()(x), block1])
    x = UpSampling2D()(x)
    
    x = Conv2D(1, kernel_size=1, activation='sigmoid')(x)
    x = Reshape((IMAGE_HEIGHT, IMAGE_WIDTH))(x)
    
    return Model(inputs=model.input, outputs=x)

mirrored_strategy = tf.distribute.MirroredStrategy()
with mirrored_strategy.scope():
  model = create_model()
  model.summary()

  optimizer = Adam(lr = 0.001)
  model.compile(loss=loss, optimizer=optimizer, metrics=[dice_coefficient])

  checkpoint = ModelCheckpoint("model-{loss:.2f}.h5", monitor="loss", verbose=1, save_best_only=True,
                             save_weights_only=True, mode="min", period=1)
  stop = EarlyStopping(monitor="loss", patience=5, mode="min")
  reduce_lr = ReduceLROnPlateau(monitor="loss", factor=0.2, patience=5, min_lr=1e-6, verbose=1, mode="min")

history=model.fit(X_train, y_train, validation_data=(X_val, y_val),
          epochs=EPOCHS,
          batch_size = BATCH_SIZE,
          callbacks = [checkpoint, stop, reduce_lr],
          verbose=1)
model.evaluate(X_val, y_val, verbose=1)

Here is the error when I run model.evaluate()

ResourceExhaustedError                    Traceback (most recent call last)
<ipython-input-26-3301985d3ba5> in <module>()
----> 1 model.evaluate(X_val, y_val, verbose=1)

8 frames
/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/execute.py in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
     58     ctx.ensure_initialized()
     59     tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
---> 60                                         inputs, attrs, num_outputs)
     61   except core._NotOkStatusException as e:
     62     if name is not None:

ResourceExhaustedError: 2 root error(s) found.
  (0) Resource exhausted:  OOM when allocating tensor with shape[32,224,224,1984] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
     [[node model/up_sampling2d_4/resize/ResizeNearestNeighbor (defined at /lib/python3.6/threading.py:916) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

     [[div_no_nan/ReadVariableOp_1/_22]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

  (1) Resource exhausted:  OOM when allocating tensor with shape[32,224,224,1984] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
     [[node model/up_sampling2d_4/resize/ResizeNearestNeighbor (defined at /lib/python3.6/threading.py:916) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

0 successful operations.
0 derived errors ignored. [Op:__inference_test_function_34461]

Function call stack:
test_function -> test_function

Upvotes: 0

Views: 558

Answers (1)

Nicolas Gervais
Nicolas Gervais

Reputation: 36594

model.evaluate() also takes batch_size as an argument so you should use this again:

batch_size = BATCH_SIZE

Otherwise you'll pass the entire dataset at once

Upvotes: 1

Related Questions