Reputation: 905
I am training my network using Keras on tensorflow backend(Keras version 2.1), I have tried many things available on internet, but did not find any solution.
My Training set and labels: 26721(each image have size (32, 32,1)) , (26721, 10)
Validation set and labels: 6680(each image have size(32,32,1), (6680,10)
This is my model so far, I am using Python3.
def CNN(input_, num_classes):
model = Sequential()
model.add(Convolution2D(16, kernel_size=(7, 7), border_mode='same',
input_shape=input_))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2), strides=(1, 1) , border_mode='same' ))
model.add(Convolution2D(64, (3, 3), padding ='same'))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(3, 3), strides=(1,1), border_mode='same' ))
model.add(Flatten())
model.add(Dense(96))
model.add(Activation('relu'))
model.add(Dense(num_classes))
model.add(Activation('softmax'))
return model
model = CNN(image_size, num_classes)
model.compile(loss=keras.losses.categorical_crossentropy,
optimizer=keras.optimizers.SGD(lr=0.01),
metrics=['accuracy'])
print(model.summary())
csv_logger = CSVLogger('training.log')
early_stop = EarlyStopping('val_acc', patience=200, verbose=1)
model_checkpoint = ModelCheckpoint(model_save_path,
'val_acc', verbose=0,
save_best_only=True)
model_callbacks = [early_stop, model_checkpoint, csv_logger]
# print "len(train_dataset) ", len(train_dataset)
print("int(len(train_dataset)/batch_size) ", int(len(train_dataset)/batch_size))
K.get_session().run(tf.global_variables_initializer())
model.fit_generator(train,
steps_per_epoch=np.ceil(len(train_dataset)/batch_size),
epochs=num_epochs,
verbose=1,
validation_data=valid,
validation_steps=batch_size,
callbacks=model_callbacks)
Model Summary:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d_1 (Conv2D) (None, 32, 32, 16) 800
_________________________________________________________________
batch_normalization_1 (Batch (None, 32, 32, 16) 64
_________________________________________________________________
activation_1 (Activation) (None, 32, 32, 16) 0
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 32, 32, 16) 0
_________________________________________________________________
conv2d_2 (Conv2D) (None, 32, 32, 64) 9280
_________________________________________________________________
batch_normalization_2 (Batch (None, 32, 32, 64) 256
_________________________________________________________________
activation_2 (Activation) (None, 32, 32, 64) 0
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 32, 32, 64) 0
_________________________________________________________________
flatten_1 (Flatten) (None, 65536) 0
_________________________________________________________________
dense_1 (Dense) (None, 96) 6291552
_________________________________________________________________
activation_3 (Activation) (None, 96) 0
_________________________________________________________________
dense_2 (Dense) (None, 10) 970
_________________________________________________________________
activation_4 (Activation) (None, 10) 0
=================================================================
Total params: 6,302,922
Trainable params: 6,302,762
Non-trainable params: 160
I am sending images according to batch size. This is my generator function:
# Generate images according to batch size
def gen(dataset, labels, batch_size):
images = []
digits = []
i = 0
while True:
images.append(dataset[i])
digits.append(labels[i])
i+=1
if i == batch_size:
yield (np.array(images), np.array(digits))
images = []
digits = []
# Generate remaining images also
if i == len(dataset):
yield (np.array(images), np.array(digits))
images, digits = [], []
i = 0
train = gen(train_data, train_labels, batch_size)
valid = gen(valid_data, valid_lables, batch_size)
Error log on terminal:
Please check this link for complete error: Terminal Output
Can anyone please help me, What I am doing wrong here?
Thanks in advance
Upvotes: 1
Views: 5421
Reputation: 1469
From the logs you can see that before allocating edge_1094_loss the memory is already full. Check the values Limit ad InUse.
This is perhaps because the memory is consumed by older models. Quick hack to solve this is to simply kill the process. This will release all the memory consumed by older models which are somehow not garbage collected.
Upvotes: 0
Reputation: 681
You are training your network on your entire train set, which is too big to fit in memory, and too large for your gpu.
The standard in machine learning is to create small batches of your data and train on those. Batch sizes are usually 16, 32, 64 or some other power of two, but it can be anything, you usually have to find the correct batch size through cross validation.
Upvotes: 3