Samuel Mideksa
Samuel Mideksa

Reputation: 465

couldn't train a model in keras

I am trying to train a All-in-one convolution model for face analysis in keras using aflw dataset which is about 19.2 GB in size. It successfully displayed model summary but it couldn't train the model.

I have a computer with RAM about 4 GB.

Loading pickle files
Loaded train, test and validation dataset
Loading test images
Loading validation images
dataset/adience.py:100: FutureWarning: Method .as_matrix will be removed in a future version. Use .values instead.
  self.test_detection = self.test_dataset["is_face"].as_matrix()
Loaded all dataset and images
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_1 (InputLayer)            (None, 227, 227, 1)  0                                            
__________________________________________________________________________________________________
conv2d_1 (Conv2D)               (None, 55, 55, 96)   11712       input_1[0][0]                    
__________________________________________________________________________________________________
batch_normalization_1 (BatchNor (None, 55, 55, 96)   384         conv2d_1[0][0]                   
__________________________________________________________________________________________________
max_pooling2d_1 (MaxPooling2D)  (None, 27, 27, 96)   0           batch_normalization_1[0][0]      
__________________________________________________________________________________________________
conv2d_2 (Conv2D)               (None, 27, 27, 256)  614656      max_pooling2d_1[0][0]            
__________________________________________________________________________________________________
batch_normalization_2 (BatchNor (None, 27, 27, 256)  1024        conv2d_2[0][0]                   
__________________________________________________________________________________________________
max_pooling2d_2 (MaxPooling2D)  (None, 13, 13, 256)  0           batch_normalization_2[0][0]      
__________________________________________________________________________________________________
conv2d_3 (Conv2D)               (None, 13, 13, 384)  885120      max_pooling2d_2[0][0]            
__________________________________________________________________________________________________
conv2d_4 (Conv2D)               (None, 13, 13, 384)  1327488     conv2d_3[0][0]                   
__________________________________________________________________________________________________
conv2d_5 (Conv2D)               (None, 13, 13, 512)  1769984     conv2d_4[0][0]                   
__________________________________________________________________________________________________
conv2d_8 (Conv2D)               (None, 6, 6, 256)    393472      max_pooling2d_1[0][0]            
__________________________________________________________________________________________________
conv2d_9 (Conv2D)               (None, 6, 6, 256)    393472      conv2d_3[0][0]                   
__________________________________________________________________________________________________
max_pooling2d_4 (MaxPooling2D)  (None, 6, 6, 512)    0           conv2d_5[0][0]                   
__________________________________________________________________________________________________
concatenate_1 (Concatenate)     (None, 6, 6, 1024)   0           conv2d_8[0][0]                   
                                                                 conv2d_9[0][0]                   
                                                                 max_pooling2d_4[0][0]            
__________________________________________________________________________________________________
conv2d_10 (Conv2D)              (None, 6, 6, 256)    262400      concatenate_1[0][0]              
__________________________________________________________________________________________________
flatten_2 (Flatten)             (None, 9216)         0           conv2d_10[0][0]                  
__________________________________________________________________________________________________
dense_3 (Dense)                 (None, 2048)         18876416    flatten_2[0][0]                  
__________________________________________________________________________________________________
dropout_3 (Dropout)             (None, 2048)         0           dense_3[0][0]                    
__________________________________________________________________________________________________
dense_11 (Dense)                (None, 512)          1049088     dropout_3[0][0]                  
__________________________________________________________________________________________________
dropout_10 (Dropout)            (None, 512)          0           dense_11[0][0]                   
__________________________________________________________________________________________________
detection_probablity (Dense)    (None, 2)            1026        dropout_10[0][0]                 
==================================================================================================
Total params: 25,586,242
Trainable params: 25,585,538
Non-trainable params: 704
__________________________________________________________________________________________________
Epoch 1/10

It says Epoch 1/10 but It stops. Is it a problem with my computer's computational problem?

Upvotes: 1

Views: 151

Answers (1)

Jeremy Bare
Jeremy Bare

Reputation: 550

If it starts running like that then it probably has enough ram to run properly. You can check your resource monitor to see how much memory is available. You can also check to see if there is any CPU usage. If there is CPU usage then it is probably just training very slowly.

That is a fairly large model so it could take an extremely long time to train on a small CPU.

Make sure your Keras verbosity is set to 1 so that it prints information every batch. Although that is the default so it should already be set that way unless you changed it.

model.fit(verbose=1)

Try also turning down the batch size to size 1 and see if you get any output (since it should complete the smaller batch faster).

If it is running properly but slowly your best bet is get a GPU to run it on. If you can't do that then you can try to compile Tensorflow from source in order to make sure you have all the CPU instruction sets and the MKL library if you want which could speed it up some.

Upvotes: 1

Related Questions