jiashenC
jiashenC

Reputation: 1932

How does keras work when create a model and do prediction?

I am using memory_profiler to examine the memory usage when the laptop use keras. Here is the memory usage output line by line.

Line #    Mem usage    Increment   Line Contents
16        187.7 MiB      0.0 MiB       model = Sequential()
17        188.4 MiB      0.7 MiB       model.add(Dense(4096, input_shape=(7680,)))
18        189.2 MiB      0.7 MiB       model.add(BatchNormalization(input_shape=(4096,)))
19        189.2 MiB      0.0 MiB       model.add(Activation('relu', input_shape=(4096,)))
20                             
21        189.3 MiB      0.1 MiB       model.add(Dense(4096, input_shape=(4096,)))
22        190.0 MiB      0.7 MiB       model.add(BatchNormalization(input_shape=(4096,)))
23        190.0 MiB      0.1 MiB       model.add(Activation('relu', input_shape=(4096,)))
24                             
25        190.0 MiB      0.0 MiB       model.add(Dense(51, input_shape=(4096,)))
26        190.8 MiB      0.8 MiB       model.add(BatchNormalization(input_shape=(51,)))
27        190.8 MiB      0.0 MiB       model.add(Activation('softmax', input_shape=(51,)))
28        191.0 MiB      0.2 MiB       test_x = np.random.rand(7680)
29        399.9 MiB    208.8 MiB       output = model.predict(np.array([test_x]))

I am trying to figure out:

  1. Why the memory usage increases while the model do prediction? I guess because memory is allocated to GPU?
  2. Why the memory does not increase while the model being created?

Upvotes: 1

Views: 121

Answers (1)

Daniel Möller
Daniel Möller

Reputation: 86600

Keras (and its backends) work based on "graphs" which connect "tensors".

During creation, the tensors themselves are just symbols. They've got shapes and connections to other tensors. But they don't have values.

There are some values in the layers, indeed, which are the model's weights. Weights consider the input shape and the amount of units, but they don't depend on the batch size.

So, during creation, you have very little data in the model. Just the model's weights and tensor representations indicating how and where the data should go. (Tensorflow is quite a great name when you think of it)

In Tensorflow, the input data is represented by what they call placeholders. It's like an empty container that will later receive your data.

Only when you start using the model (when you pass the existing input data) will it then actually have data and use space. (Not sure if it will promptly release this space later)

There are some reasons for the size being way bigger than the empty model:

  • First, you have batch size in your data, which is not in the model. The bigger the batch, the more memory it will consume.
  • It seems (need an expert's confirmation) that (at least tensorflow) allocates the space for all tensors in the model at once. So, the input, the first dense output, the first normalization, the first activation, the next dense output, etc..., will all occupy space at the same time. (I came to this conclusion while working with GPU. It complained about a certain tensor not fitting the memory. I checked the shape and saw it was in an advanced layer. As I gradually reduced the batch size, other tensors appearing earlier in the model were the guilty ones. Until I used a batch small enough).
  • I'm not sure if this applies to "prediction", but there is also all the space necessary for the gradient calculations as well.

Hint for the model:

  • Only the first layer needs input_shape. The others are all automatically inferred (and the passed values are probably ignored).

Upvotes: 1

Related Questions