Reputation: 465
I am trying to train a All-in-one convolution model for face analysis in keras using aflw dataset which is about 19.2 GB in size. It successfully displayed model summary but it couldn't train the model.
I have a computer with RAM about 4 GB.
Loading pickle files
Loaded train, test and validation dataset
Loading test images
Loading validation images
dataset/adience.py:100: FutureWarning: Method .as_matrix will be removed in a future version. Use .values instead.
self.test_detection = self.test_dataset["is_face"].as_matrix()
Loaded all dataset and images
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_1 (InputLayer) (None, 227, 227, 1) 0
__________________________________________________________________________________________________
conv2d_1 (Conv2D) (None, 55, 55, 96) 11712 input_1[0][0]
__________________________________________________________________________________________________
batch_normalization_1 (BatchNor (None, 55, 55, 96) 384 conv2d_1[0][0]
__________________________________________________________________________________________________
max_pooling2d_1 (MaxPooling2D) (None, 27, 27, 96) 0 batch_normalization_1[0][0]
__________________________________________________________________________________________________
conv2d_2 (Conv2D) (None, 27, 27, 256) 614656 max_pooling2d_1[0][0]
__________________________________________________________________________________________________
batch_normalization_2 (BatchNor (None, 27, 27, 256) 1024 conv2d_2[0][0]
__________________________________________________________________________________________________
max_pooling2d_2 (MaxPooling2D) (None, 13, 13, 256) 0 batch_normalization_2[0][0]
__________________________________________________________________________________________________
conv2d_3 (Conv2D) (None, 13, 13, 384) 885120 max_pooling2d_2[0][0]
__________________________________________________________________________________________________
conv2d_4 (Conv2D) (None, 13, 13, 384) 1327488 conv2d_3[0][0]
__________________________________________________________________________________________________
conv2d_5 (Conv2D) (None, 13, 13, 512) 1769984 conv2d_4[0][0]
__________________________________________________________________________________________________
conv2d_8 (Conv2D) (None, 6, 6, 256) 393472 max_pooling2d_1[0][0]
__________________________________________________________________________________________________
conv2d_9 (Conv2D) (None, 6, 6, 256) 393472 conv2d_3[0][0]
__________________________________________________________________________________________________
max_pooling2d_4 (MaxPooling2D) (None, 6, 6, 512) 0 conv2d_5[0][0]
__________________________________________________________________________________________________
concatenate_1 (Concatenate) (None, 6, 6, 1024) 0 conv2d_8[0][0]
conv2d_9[0][0]
max_pooling2d_4[0][0]
__________________________________________________________________________________________________
conv2d_10 (Conv2D) (None, 6, 6, 256) 262400 concatenate_1[0][0]
__________________________________________________________________________________________________
flatten_2 (Flatten) (None, 9216) 0 conv2d_10[0][0]
__________________________________________________________________________________________________
dense_3 (Dense) (None, 2048) 18876416 flatten_2[0][0]
__________________________________________________________________________________________________
dropout_3 (Dropout) (None, 2048) 0 dense_3[0][0]
__________________________________________________________________________________________________
dense_11 (Dense) (None, 512) 1049088 dropout_3[0][0]
__________________________________________________________________________________________________
dropout_10 (Dropout) (None, 512) 0 dense_11[0][0]
__________________________________________________________________________________________________
detection_probablity (Dense) (None, 2) 1026 dropout_10[0][0]
==================================================================================================
Total params: 25,586,242
Trainable params: 25,585,538
Non-trainable params: 704
__________________________________________________________________________________________________
Epoch 1/10
It says Epoch 1/10 but It stops. Is it a problem with my computer's computational problem?
Upvotes: 1
Views: 151
Reputation: 550
If it starts running like that then it probably has enough ram to run properly. You can check your resource monitor to see how much memory is available. You can also check to see if there is any CPU usage. If there is CPU usage then it is probably just training very slowly.
That is a fairly large model so it could take an extremely long time to train on a small CPU.
Make sure your Keras verbosity is set to 1 so that it prints information every batch. Although that is the default so it should already be set that way unless you changed it.
model.fit(verbose=1)
Try also turning down the batch size to size 1 and see if you get any output (since it should complete the smaller batch faster).
If it is running properly but slowly your best bet is get a GPU to run it on. If you can't do that then you can try to compile Tensorflow from source in order to make sure you have all the CPU instruction sets and the MKL library if you want which could speed it up some.
Upvotes: 1