Reputation: 822
I'm using Keras
framework to build a stacked LSTM
model as follows:
model.add(layers.LSTM(units=32,
batch_input_shape=(1, 100, 64),
stateful=True,
return_sequences=True))
model.add(layers.LSTM(units=32, stateful=True, return_sequences=True))
model.add(layers.LSTM(units=32, stateful=True, return_sequences=False))
model.add(layers.Dense(1))
model.compile(loss='mean_squared_error', optimizer='adam')
model.fit(train_dataset,
train_labels,
epochs=1,
validation_split = 0.2,
verbose=1,
batch_size=1,
shuffle=False)
Knowing that the default batch_size
for mode.fit
, model.predict
and model.evaluate
is 32, the model forces me to change this default batch_size
to the samebatch_size
value used in batch_input_shape (batch_size, time_steps, input_dims)
.
My questions are:
batch_size
into
batch_input_shape
or into the model.fit
? batch_size
, lets say 10, and evaluate on a single batch (rather than
10 batches) if I passes the batch_size
into the structure of the
LSTM
layer through batch_input_shape
?Upvotes: 3
Views: 3365
Reputation: 21
when the lstm layer is in stateful mode, the batch size must be given and cannot be None. this is because the lstm is stateful and needs to know how to concatenate the hidden states from the t-1 timestep batch to the t timestep batch
Upvotes: 2
Reputation: 8605
When you create a Sequential()
model it is defined to support any batch size. In particular, in TensorFlow 1.*
the input is a placeholder that has None
as the first dimension:
import tensorflow as tf
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Dense(units=2, input_shape=(2, )))
print(model.inputs[0].get_shape().as_list()) # [None, 2] <-- supports any batch size
print(model.inputs[0].op.type == 'Placeholder') # True
If you use tf.keras.InputLayer()
you can define a fixed batch size like this:
import tensorflow as tf
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.InputLayer((2,), batch_size=50)) # <-- same as using batch_input_shape
model.add(tf.keras.layers.Dense(units=2, input_shape=(2, )))
print(model.inputs[0].get_shape().as_list()) # [50, 2] <-- supports only batch_size==50
print(model.inputs[0].op.type == 'Placeholder') # True
The batch size of model.fit()
method is used to split your data to batches. For example, if you use InputLayer()
and define a fixed batch size while providing different value of a batch size to the model.fit()
method you will get ValueError
:
import tensorflow as tf
import numpy as np
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.InputLayer((2,), batch_size=2)) # <--batch_size==2
model.add(tf.keras.layers.Dense(units=2, input_shape=(2, )))
model.compile(optimizer=tf.keras.optimizers.Adam(),
loss='categorical_crossentropy')
x_train = np.random.normal(size=(10, 2))
y_train = np.array([[0, 1] for _ in range(10)])
model.fit(x_train, y_train, batch_size=3) # <--batch_size==3
This will raise:
ValueError: The
batch_sizeargument value 3 is incompatible with the specified batch size of your Input Layer: 2
To summarize: If you define a batch size None
you can pass any number of samples for training or evaluation, even all samples at once without splitting to batches (if the data is too big you will get OutOfMemoryError
). If you define a fixed batch size you will have to use the same fixed batch size for training and evaluation.
Upvotes: 1