Arkleseisure
Arkleseisure

Reputation: 438

Why does keras model.fit use so much memory despite using allow_growth=True?

I have, thanks to this question mostly been able to solve the problem of tensorflow allocating memory which I didn't want allocated. However, I have recently found that despite my using set_session with allow_growth=True, using model.fit will still mean that all the memory is allocated and I can no longer use it for the rest of my program, even when the function is exited and the model should no longer have any allocated memory due to the fact that the model is a local variable. Here is some example code demonstrating this:

from numpy import array
from keras import Input, Model
from keras.layers import Conv2D, Dense, Flatten
from keras.optimizers import SGD

# stops keras/tensorflow from allocating all the GPU's memory immediately
from tensorflow.compat.v1.keras.backend import set_session
from tensorflow.compat.v1 import Session, ConfigProto, GPUOptions
tf_config = ConfigProto(gpu_options=GPUOptions(allow_growth=True))
session = Session(config=tf_config)
set_session(session)


# makes the neural network
def make_net():
    input = Input((2, 3, 3))
    conv = Conv2D(256, (1, 1))(input)
    flattened_input = Flatten()(conv)
    output = Dense(1)(flattened_input)
    model = Model(inputs=input, outputs=output)
    sgd = SGD(0.2, 0.9)
    model.compile(sgd, 'mean_squared_error')
    model.summary()
    return model


def make_data(input_data, target_output):
    input_data.append([[[0 for i in range(3)] for j in range(3)] for k in range(2)])
    target_output.append(0)


def main():
    data_amount = 4096
    input_data = []
    target_output = []
    model = make_model()
    for i in range(data_amount):
        make_data(input_data, target_output)
    model.fit(array(input_data), array(target_output),  batch_size=len(input_data))
    return


while True:
    main()

When I run this code with the Pycharm debugger, I find that the GPU RAM used stays at around 0.1GB until I run model.fit for the first time, at which point the memory usage shoots up to 3.2GB of my 4GB of GPU RAM. I have also noted that the memory usage doesn't increase after the first time that model.fit is run and that if I remove the convolutional layer from my network, the memory increase doesn't happen at all. Could someone please shine some light on my problem?

UPDATE: Setting per_process_gpu_memory_fraction in GPUOptions to 0.1 helps limit the effect in the code included, but not in my actual program. A better solution would still be helpful.

Upvotes: 8

Views: 5514

Answers (2)

Xu Qiushi
Xu Qiushi

Reputation: 1161

I used to face this problem. And I found a solution from someone who I can't find anymore. His solution I paste below. In fact, I found that if you set allow_growth=True, tensorflow seems to use all your memory. So you should just set your max limit.

try this:

gpus = tf.config.experimental.list_physical_devices("GPU")
if gpus:
    # Restrict TensorFlow to only use the first GPU
    try:
        for gpu in gpus:
            tf.config.experimental.set_memory_growth(gpu, False)
            tf.config.experimental.set_virtual_device_configuration(
                gpu,
                [
                    tf.config.experimental.VirtualDeviceConfiguration(
                        memory_limit=12288  # set your limit
                    )
                ],
            )
        tf.config.experimental.set_visible_devices(gpus[0], "GPU")
        logical_gpus = tf.config.experimental.list_logical_devices("GPU")
        print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPU")
    except RuntimeError as e:
        # Visible devices must be set before GPUs have been initialized
        print(e)

Upvotes: 6

Yannick Funk
Yannick Funk

Reputation: 1581

Training with SGD and the whole training data in one batch can (depending on your input data) be very memory consumptive. Try tweaking your batch_size to a lower size (e.g. 8, 16, 32)

Upvotes: 0

Related Questions