Reputation: 29
I am running a Mobilenet model on X-ray images on tensorflow GPU. I am able to fit the model without any errors (using batch size=1). However when I try to call model.evaluate it gives me "Resource Exhausted error"
Here is the model with input shape (224,224,3)
from tensorflow.keras.applications.mobilenet import MobileNet
from tensorflow.keras.layers import Concatenate, UpSampling2D, Conv2D, Reshape
from tensorflow.keras.models import Model
def create_model(trainable=True):
model = MobileNet(input_shape=(IMAGE_HEIGHT, IMAGE_WIDTH, 3), include_top=False, alpha=ALPHA, weights='imagenet')
for layer in model.layers:
layer.trainable = trainable
block1 = model.get_layer("conv_pw_1_relu").output
block2 = model.get_layer("conv_pw_3_relu").output
block3 = model.get_layer("conv_pw_5_relu").output
block4 = model.get_layer("conv_pw_11_relu").output
block5 = model.get_layer("conv_pw_13_relu").output
x = Concatenate()([UpSampling2D()(block5), block4])
x = Concatenate()([UpSampling2D()(x), block3])
x = Concatenate()([UpSampling2D()(x), block2])
x = Concatenate()([UpSampling2D()(x), block1])
x = UpSampling2D()(x)
x = Conv2D(1, kernel_size=1, activation='sigmoid')(x)
x = Reshape((IMAGE_HEIGHT, IMAGE_WIDTH))(x)
return Model(inputs=model.input, outputs=x)
mirrored_strategy = tf.distribute.MirroredStrategy()
with mirrored_strategy.scope():
model = create_model()
model.summary()
optimizer = Adam(lr = 0.001)
model.compile(loss=loss, optimizer=optimizer, metrics=[dice_coefficient])
checkpoint = ModelCheckpoint("model-{loss:.2f}.h5", monitor="loss", verbose=1, save_best_only=True,
save_weights_only=True, mode="min", period=1)
stop = EarlyStopping(monitor="loss", patience=5, mode="min")
reduce_lr = ReduceLROnPlateau(monitor="loss", factor=0.2, patience=5, min_lr=1e-6, verbose=1, mode="min")
history=model.fit(X_train, y_train, validation_data=(X_val, y_val),
epochs=EPOCHS,
batch_size = BATCH_SIZE,
callbacks = [checkpoint, stop, reduce_lr],
verbose=1)
model.evaluate(X_val, y_val, verbose=1)
Here is the error when I run model.evaluate()
ResourceExhaustedError Traceback (most recent call last)
<ipython-input-26-3301985d3ba5> in <module>()
----> 1 model.evaluate(X_val, y_val, verbose=1)
8 frames
/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/execute.py in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
58 ctx.ensure_initialized()
59 tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
---> 60 inputs, attrs, num_outputs)
61 except core._NotOkStatusException as e:
62 if name is not None:
ResourceExhaustedError: 2 root error(s) found.
(0) Resource exhausted: OOM when allocating tensor with shape[32,224,224,1984] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[node model/up_sampling2d_4/resize/ResizeNearestNeighbor (defined at /lib/python3.6/threading.py:916) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
[[div_no_nan/ReadVariableOp_1/_22]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
(1) Resource exhausted: OOM when allocating tensor with shape[32,224,224,1984] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[node model/up_sampling2d_4/resize/ResizeNearestNeighbor (defined at /lib/python3.6/threading.py:916) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
0 successful operations.
0 derived errors ignored. [Op:__inference_test_function_34461]
Function call stack:
test_function -> test_function
Upvotes: 0
Views: 558
Reputation: 36594
model.evaluate()
also takes batch_size
as an argument so you should use this again:
batch_size = BATCH_SIZE
Otherwise you'll pass the entire dataset at once
Upvotes: 1