peter bence
peter bence

Reputation: 822

How to use Keras LSTM batch_input_size properly

I'm using Keras framework to build a stacked LSTM model as follows:

model.add(layers.LSTM(units=32,
                      batch_input_shape=(1, 100, 64),
                      stateful=True,
                      return_sequences=True))
model.add(layers.LSTM(units=32, stateful=True, return_sequences=True))
model.add(layers.LSTM(units=32, stateful=True, return_sequences=False))
model.add(layers.Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')
model.fit(train_dataset,
          train_labels,
          epochs=1,
          validation_split = 0.2,
          verbose=1,
          batch_size=1,
          shuffle=False)

Knowing that the default batch_size for mode.fit, model.predict and model.evaluate is 32, the model forces me to change this default batch_size to the samebatch_size value used in batch_input_shape (batch_size, time_steps, input_dims).

My questions are:

  1. What is the difference between passing the batch_size into batch_input_shape or into the model.fit?
  2. Could I train with batch_size, lets say 10, and evaluate on a single batch (rather than 10 batches) if I passes the batch_size into the structure of the LSTM layer through batch_input_shape?

Upvotes: 3

Views: 3365

Answers (2)

Dor Livne
Dor Livne

Reputation: 21

when the lstm layer is in stateful mode, the batch size must be given and cannot be None. this is because the lstm is stateful and needs to know how to concatenate the hidden states from the t-1 timestep batch to the t timestep batch

Upvotes: 2

Vlad
Vlad

Reputation: 8605

When you create a Sequential() model it is defined to support any batch size. In particular, in TensorFlow 1.* the input is a placeholder that has None as the first dimension:

import tensorflow as tf

model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Dense(units=2, input_shape=(2, )))
print(model.inputs[0].get_shape().as_list()) # [None, 2] <-- supports any batch size
print(model.inputs[0].op.type == 'Placeholder') # True

If you use tf.keras.InputLayer() you can define a fixed batch size like this:

import tensorflow as tf

model = tf.keras.models.Sequential()
model.add(tf.keras.layers.InputLayer((2,), batch_size=50)) # <-- same as using batch_input_shape
model.add(tf.keras.layers.Dense(units=2, input_shape=(2, )))
print(model.inputs[0].get_shape().as_list()) # [50, 2] <-- supports only batch_size==50
print(model.inputs[0].op.type == 'Placeholder') # True

The batch size of model.fit() method is used to split your data to batches. For example, if you use InputLayer() and define a fixed batch size while providing different value of a batch size to the model.fit() method you will get ValueError:

import tensorflow as tf
import numpy as np

model = tf.keras.models.Sequential()
model.add(tf.keras.layers.InputLayer((2,), batch_size=2)) # <--batch_size==2
model.add(tf.keras.layers.Dense(units=2, input_shape=(2, )))
model.compile(optimizer=tf.keras.optimizers.Adam(),
              loss='categorical_crossentropy')
x_train = np.random.normal(size=(10, 2))
y_train = np.array([[0, 1] for _ in range(10)])

model.fit(x_train, y_train, batch_size=3) # <--batch_size==3 

This will raise: ValueError: Thebatch_sizeargument value 3 is incompatible with the specified batch size of your Input Layer: 2

To summarize: If you define a batch size None you can pass any number of samples for training or evaluation, even all samples at once without splitting to batches (if the data is too big you will get OutOfMemoryError). If you define a fixed batch size you will have to use the same fixed batch size for training and evaluation.

Upvotes: 1

Related Questions