Reputation: 45981
I'm using Google Colaboratory to the following U-NET network:
def unet(pretrained_weights = None,input_size = (240, 240, 1)):
inputs = Input(input_size)
conv1 = Conv2D(64, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(inputs)
conv1 = Conv2D(64, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv1)
pool1 = MaxPooling2D(pool_size=(2, 2))(conv1)
conv2 = Conv2D(128, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(pool1)
conv2 = Conv2D(128, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv2)
pool2 = MaxPooling2D(pool_size=(2, 2))(conv2)
conv3 = Conv2D(256, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(pool2)
conv3 = Conv2D(256, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv3)
pool3 = MaxPooling2D(pool_size=(2, 2))(conv3)
conv4 = Conv2D(512, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(pool3)
conv4 = Conv2D(512, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv4)
drop4 = Dropout(0.5)(conv4)
pool4 = MaxPooling2D(pool_size=(2, 2))(drop4)
conv5 = Conv2D(1024, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(pool4)
conv5 = Conv2D(1024, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv5)
drop5 = Dropout(0.5)(conv5)
up6 = Conv2D(512, 2, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(UpSampling2D(size = (2,2))(drop5))
merge6 = concatenate([drop4,up6], axis = 3)
conv6 = Conv2D(512, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(merge6)
conv6 = Conv2D(512, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv6)
up7 = Conv2D(256, 2, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(UpSampling2D(size = (2,2))(conv6))
merge7 = concatenate([conv3,up7], axis = 3)
conv7 = Conv2D(256, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(merge7)
conv7 = Conv2D(256, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv7)
up8 = Conv2D(128, 2, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(UpSampling2D(size = (2,2))(conv7))
merge8 = concatenate([conv2,up8], axis = 3)
conv8 = Conv2D(128, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(merge8)
conv8 = Conv2D(128, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv8)
up9 = Conv2D(64, 2, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(UpSampling2D(size = (2,2))(conv8))
merge9 = concatenate([conv1,up9], axis = 3)
conv9 = Conv2D(64, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(merge9)
conv9 = Conv2D(64, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv9)
conv9 = Conv2D(2, 3, activation = 'relu', padding = 'same', kernel_initializer = 'he_normal')(conv9)
conv10 = Conv2D(1, 1, activation = 'sigmoid')(conv9)
model = Model(inputs = inputs, outputs = conv10)
model.compile(tf.keras.optimizers.Adam(lr = 1e-4), loss = 'binary_crossentropy', metrics = ['accuracy'])
#model.summary()
if(pretrained_weights):
model.load_weights(pretrained_weights)
return model
With this summary:
<class 'tensorflow.python.keras.engine.training.Model'> Model: "model" __________________________________________________________________________________________________ Layer (type) Output Shape Param # Connected to ================================================================================================== input_1 (InputLayer) [(None, 240, 240, 1) 0 __________________________________________________________________________________________________ conv2d (Conv2D) (None, 240, 240, 64) 640 input_1[0][0] __________________________________________________________________________________________________ conv2d_1 (Conv2D) (None, 240, 240, 64) 36928 conv2d[0][0] __________________________________________________________________________________________________ max_pooling2d (MaxPooling2D) (None, 120, 120, 64) 0 conv2d_1[0][0] __________________________________________________________________________________________________ conv2d_2 (Conv2D) (None, 120, 120, 128 73856 max_pooling2d[0][0] __________________________________________________________________________________________________ conv2d_3 (Conv2D) (None, 120, 120, 128 147584 conv2d_2[0][0] __________________________________________________________________________________________________ max_pooling2d_1 (MaxPooling2D) (None, 60, 60, 128) 0 conv2d_3[0][0] __________________________________________________________________________________________________ conv2d_4 (Conv2D) (None, 60, 60, 256) 295168 max_pooling2d_1[0][0] __________________________________________________________________________________________________ conv2d_5 (Conv2D) (None, 60, 60, 256) 590080 conv2d_4[0][0] __________________________________________________________________________________________________ max_pooling2d_2 (MaxPooling2D) (None, 30, 30, 256) 0 conv2d_5[0][0] __________________________________________________________________________________________________ conv2d_6 (Conv2D) (None, 30, 30, 512) 1180160 max_pooling2d_2[0][0] __________________________________________________________________________________________________ conv2d_7 (Conv2D) (None, 30, 30, 512) 2359808 conv2d_6[0][0] __________________________________________________________________________________________________ dropout (Dropout) (None, 30, 30, 512) 0 conv2d_7[0][0] __________________________________________________________________________________________________ max_pooling2d_3 (MaxPooling2D) (None, 15, 15, 512) 0 dropout[0][0] __________________________________________________________________________________________________ conv2d_8 (Conv2D) (None, 15, 15, 1024) 4719616 max_pooling2d_3[0][0] __________________________________________________________________________________________________ conv2d_9 (Conv2D) (None, 15, 15, 1024) 9438208 conv2d_8[0][0] __________________________________________________________________________________________________ dropout_1 (Dropout) (None, 15, 15, 1024) 0 conv2d_9[0][0] __________________________________________________________________________________________________ up_sampling2d (UpSampling2D) (None, 30, 30, 1024) 0 dropout_1[0][0] __________________________________________________________________________________________________ conv2d_10 (Conv2D) (None, 30, 30, 512) 2097664 up_sampling2d[0][0] __________________________________________________________________________________________________ concatenate (Concatenate) (None, 30, 30, 1024) 0 dropout[0][0] conv2d_10[0][0] __________________________________________________________________________________________________ conv2d_11 (Conv2D) (None, 30, 30, 512) 4719104 concatenate[0][0] __________________________________________________________________________________________________ conv2d_12 (Conv2D) (None, 30, 30, 512) 2359808 conv2d_11[0][0] __________________________________________________________________________________________________ up_sampling2d_1 (UpSampling2D) (None, 60, 60, 512) 0 conv2d_12[0][0] __________________________________________________________________________________________________ conv2d_13 (Conv2D) (None, 60, 60, 256) 524544 up_sampling2d_1[0][0] __________________________________________________________________________________________________ concatenate_1 (Concatenate) (None, 60, 60, 512) 0 conv2d_5[0][0] conv2d_13[0][0] __________________________________________________________________________________________________ conv2d_14 (Conv2D) (None, 60, 60, 256) 1179904 concatenate_1[0][0] __________________________________________________________________________________________________ conv2d_15 (Conv2D) (None, 60, 60, 256) 590080 conv2d_14[0][0] __________________________________________________________________________________________________ up_sampling2d_2 (UpSampling2D) (None, 120, 120, 256 0 conv2d_15[0][0] __________________________________________________________________________________________________ conv2d_16 (Conv2D) (None, 120, 120, 128 131200 up_sampling2d_2[0][0] __________________________________________________________________________________________________ concatenate_2 (Concatenate) (None, 120, 120, 256 0 conv2d_3[0][0] conv2d_16[0][0] __________________________________________________________________________________________________ conv2d_17 (Conv2D) (None, 120, 120, 128 295040 concatenate_2[0][0] __________________________________________________________________________________________________ conv2d_18 (Conv2D) (None, 120, 120, 128 147584 conv2d_17[0][0] __________________________________________________________________________________________________ up_sampling2d_3 (UpSampling2D) (None, 240, 240, 128 0 conv2d_18[0][0] __________________________________________________________________________________________________ conv2d_19 (Conv2D) (None, 240, 240, 64) 32832 up_sampling2d_3[0][0] __________________________________________________________________________________________________ concatenate_3 (Concatenate) (None, 240, 240, 128 0 conv2d_1[0][0] conv2d_19[0][0] __________________________________________________________________________________________________ conv2d_20 (Conv2D) (None, 240, 240, 64) 73792 concatenate_3[0][0] __________________________________________________________________________________________________ conv2d_21 (Conv2D) (None, 240, 240, 64) 36928 conv2d_20[0][0] __________________________________________________________________________________________________ conv2d_22 (Conv2D) (None, 240, 240, 2) 1154 conv2d_21[0][0] __________________________________________________________________________________________________ conv2d_23 (Conv2D) (None, 240, 240, 1) 3 conv2d_22[0][0] ================================================================================================== Total params: 31,031,685 Trainable params: 31,031,685 Non-trainable params: 0
When I train this network with the following code:
I get this error:
Train on 864 samples, validate on 96 samples
Epoch 1/5
32/864 [>.............................] - ETA: 4:20
---------------------------------------------------------------------------
ResourceExhaustedError Traceback (most recent call last)
<ipython-input-12-bed1e9ed5833> in <module>()
3
4 results = model.fit(X_train, y_train, batch_size=32, epochs=5,
----> 5 validation_data=(X_valid, y_valid))
6
11 frames
/usr/local/lib/python3.6/dist-packages/six.py in raise_from(value, from_value)
ResourceExhaustedError: OOM when allocating tensor with shape[32,128,240,240] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[{{node Conv2DBackpropFilter_4-0-TransposeNHWCToNCHW-LayoutOptimizer}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
[Op:__inference_distributed_function_3113]
Function call stack:
distributed_function
Any suggestion to improve my network? Maybe this is because I'm using images with float pixels, and values between 0.0 and 1684.0.
Another option is Google Colaboratory is full at this moment. I have tried five times, and I get four times this error and only on run successfully.
Upvotes: 0
Views: 1080
Reputation: 15053
The problem comes from the fact that you run out of memory while training the neural network, it is not related to the float value of the pixels.
The solution is to gradually reduce the batch_size
parameter.
If you pay attention here:
results = model.fit(X_train, y_train, batch_size=32, epochs=5,
validation_data=(X_valid, y_valid))
you will see that the default batch_size
is 32.
Reduce it to 16. If it still does not work and throws OOM errors, reduce it to 8; do this(reduce with a factor of 2) until there are no more OOM errors.
Upvotes: 1