ResourceExhaustedError error while training neural network

Question

All welcome. I'm trying to train my first neural network.

When i'm try to train her - this error appears:

tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[502656,128] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc

I read, I realized that this is due to the fact that little memory in the video card (GTX 1050 2 gb).

It turns out that I can’t use a video card here at all?

Maybe i can somehow "portions" issue video card dataset?

Code:

import keras
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten
from keras.layers import Conv2D, MaxPooling2D
from keras import backend as K
import numpy as np

batch_size = 1
num_classes = 3
epochs = 2

# input image dimensions
img_rows, img_cols = 135, 240

dataset = Dataset()

x_train, y_train = dataset.LoadDataset()

x_train = x_train[0]
y_train = y_train[0]

x_train = np.array(x_train).reshape(10000, 135, 240, 1)

input_shape = (img_rows, img_cols, 1)

x_train = x_train.astype('float32')

x_train = x_train / 255

model = Sequential()
model.add(Conv2D(32, kernel_size=(1, 1),
                 activation='relu',
                 input_shape=input_shape))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(num_classes, activation='softmax'))

model.compile(loss=keras.losses.categorical_crossentropy,
              optimizer=keras.optimizers.Adadelta(),
              metrics=['accuracy'])

model.fit([x_train], [y_train],
          batch_size=batch_size,
          epochs=epochs,
          verbose=1)

model.save("First.model")

score = model.evaluate([x_train], [y_train], verbose=0)

print('Test loss:', score[0])
print('Test accuracy:', score[1])

Jeffrey Ede · Accepted Answer

gradient-checkpointing is a library developed by OpenAI to reduce neural network's memory footprints. It does this by saving some tensors during the forward pass (where the loss is calculated) and recalculating other tensors in the backward pass (where gradients are calculated by backpropagating the loss).

The library is advertised on reddit as allowing you to train models with 10x the memory in exchange for 20% higher computational cost. However, in my experience trying it with a medium-sized CNN with the best settings I could find, it allowed me to train with 2x the memory and 30% higher computational cost. The iteration peak memory graph on their GitHub page suggests you may only get the higher benefits with very large networks.

Extra: 2 GB is too little GPU memory to train most neural networks. This article from 2018 recommends a minimum of 6 GB. If you can, I recommend getting a high-end GPU with more memory and processing power. Alternatively, you can use cloud computing services. Google Cloud gives you a completely free trial where you get $300 to spend on cloud services for a year.

ResourceExhaustedError error while training neural network

Answers (1)

Related Questions