Mitchell van Zuylen
Mitchell van Zuylen

Reputation: 4135

Dst tensor is not initialized. Even with small batch size

There are many, many questions about this on SO. The answers to all of them appear to be rather straight forward, pointing out it's almost certainly a memory error and that reducing the batch size should work.

In my case something else appears to be going on (or I am having a serious misunderstanding of how this work).

I have a large set of stimuli, like so:

train_x.shape # returns (2352, 131072, 2), amount 2.3k stimuli of size 131072x2
test_y.shape  # returns (2352,)

Of course, we can imagine that this might be too much. Indeed, creating a simple model and not setting any batch size returns in the InternalError.

model = Sequential([
    Flatten(input_shape=(131072, 2)), 
    Dense(128, activation=tf.nn.relu), 
    Dense(50, activation=tf.nn.relu), 
    Dense(1), 
])

model.compile(optimizer='adam', loss='mse', metrics=['accuracy'])

model.fit(train_x, train_y, epochs=5)

This returns the following error:

InternalError: Failed copying input tensor from /job:localhost/replica:0/task:0/device:CPU:0 to /job:localhost/replica:0/task:0/device:GPU:0 in order to run _EagerConst: Dst tensor is not initialized.

The logical thing to do is reduce the bath size. However, setting any value from 1 to 2000 simple returns the same error. This appears to imply that I don't have enough memory remaining to load a single stimuli. However..

Not just a memory error

If I manually cut up my dataset like so:

# Take first 20 stimuli
smaller_train_x = train_x[0:20,::] # shape is (20, 131072, 2)
smaller_trian_y = test_y[0:20]     # shape is (20, )

If I try to fit the model to this smaller dataset, it works and does not return an error.

model.fit(smaller_train_x, smaller_trian_y, epochs=5)

Thus, setting a batch_size of a single stimuli, I get a memory error. However, running on a manual cut of my dataset of 20 stimuli works fine.

In short, the problem:

As I understand it

# Load in one stimuli at a time
model.fit(train_x, train_y, epochs=5, batch_size=1)

should use ~20 times less memory then

# Load in 20 stimuli at a time
model.fit(smaller_train_x, smaller_trian_y, epochs=5)

How then does the first return an memory error?

I'm running this on a jupyer notebook with python version 3.8 and tensorFlow version 2.10.0

Upvotes: 0

Views: 604

Answers (1)

Vijay Mariappan
Vijay Mariappan

Reputation: 17201

Based on the following experiments, size of train samples passed to model.fit(...) also matters along with the batch_size.

train_x: Peak GPU memory increased with batch_size but not linearly

model.fit(train_x, train_y, epochs=1, batch_size=10, callbacks= [MemoryPrintingCallback()])
#GPU memory details [current: 2.7969 gb, peak: 3.0 gb]

model.fit(train_x, train_y, epochs=1, batch_size=100, callbacks= [MemoryPrintingCallback()])
#GPU memory details [current: 2.7969 gb, peak: 3.0 gb]

model.fit(train_x, train_y, epochs=1, batch_size=1000, callbacks= [MemoryPrintingCallback()])
#GPU memory details [current: 2.7969 gb, peak: 4.0 gb]

smaller_train_x: Peak GPU lower than previous case for same batch size

model.fit(smaller_train_x, smaller_trian_y, epochs=1, batch_size=10, callbacks= [MemoryPrintingCallback()])
#GPU memory details [current: 0.5 gb, peak: 0.6348 gb]

Converting train_x to tfrecords seems optimal, and linear increase in GPU memory

dataset = dataset.batch(10)
model.fit(dataset, epochs=1,callbacks= [MemoryPrintingCallback()])
#GPU memory details [current: 0.5 gb, peak: 0.6348 gb]

dataset = dataset.batch(100)
model.fit(dataset, epochs=1,callbacks= [MemoryPrintingCallback()])
#GPU memory details [current: 0.5 gb, peak: 0.7228 gb]

dataset = dataset.batch(1000)
model.fit(dataset, epochs=1,callbacks= [MemoryPrintingCallback()])
#GPU memory details [current: 0.5 gb, peak: 1.6026 gb]

MemoryPrintingCallback(): How to print the maximum memory used during Keras's model.fit()

numpy-to-tfrecords: Numpy to TFrecords: Is there a more simple way to handle batch inputs from tfrecords?

Upvotes: 1

Related Questions