\n
The issue is that you are passing different sized batches as inputs to the model. This is because the first channel none
takes the batch size.
Let's look at what happens in your model step by step only for 2 inputs (Ximages1, and Ximages2).
\nYou first pass (for each batch from the generator)
\nInput Layer -
\ninput_1 (InputLayer) [(None, 4096)] #(47, 4096) Ximages1\ninput_3 (InputLayer) [(None, 4096)] #(4, 4096) Ximages2\n
\nThese go into intermediate layers until they reach the individual LSTMs.
\nLSTM Layers -
\nlstm (LSTM) (None, 500) concat1[0][0] #(47, 500) \nlstm_1 (LSTM) (None, 500) concat2[0][0] #(4, 500)\n
\nNow the next layer, concatenate tries to combine the 2 layers into a single one as -
\nconcat3 (Concatenate) (None, 1000) lstm[0][0] #(47, 500) \n lstm_1[0][0] #(4, 500)\n
\nFrom an architecture point of view, it can concatenate (none, 500)
and the second (none, 500)
over the first channel (batch_size), however, the assumption being that the same number of samples are received by the layer for each batch.
In other words, you can't concatenate a (47, 500)
with a (4,500)
over the first axis.
Reputation: 363
I'm trying to build a multitask image captioning model, which contains of two separate encoder-decoder models with lstms, each of which takes inputs from different datasets, and then outputs of lstms are combined via concatenate, and output of a concatenation layer then passed to Dense. Here is a model code:
def define_model(vocab_size1, max_length1, vocab_size2, max_length2):
# first
inputs1 = Input(shape=(4096,))
print(inputs1.shape)
fe1_1 = Dropout(0.5)(inputs1)
fe2_1 = Dense(EMBEDDING_DIM, activation='relu')(fe1_1)
fe3_1 = RepeatVector(max_length1)(fe2_1)
inputs2 = Input(shape=(max_length1,))
print(inputs2.shape)
emb2_1 = Embedding(vocab_size1, EMBEDDING_DIM, mask_zero=True)(inputs2)
merged1 = concatenate([fe3_1, emb2_1], name='concat1')
lm2_1 = LSTM(500, return_sequences=False)(merged1)
#second
inputs3 = Input(shape=(4096,))
fe1_2 = Dropout(0.5)(inputs3)
fe2_2 = Dense(EMBEDDING_DIM, activation='relu')(fe1_2)
fe3_2 = RepeatVector(max_length2)(fe2_2)
inputs4 = Input(shape=(max_length2,))
emb2_2 = Embedding(vocab_size2, EMBEDDING_DIM, mask_zero=True)(inputs4)
merged2 = concatenate([fe3_2, emb2_2], name='concat2')
lm2_2 = LSTM(500, return_sequences=False)(merged2)
# merge
merged3 = concatenate([lm2_1, lm2_2], name='concat3') # error
outputs = Dense(vocab_size1, activation='softmax')(merged3)
outputs1 = Dense(vocab_size2, activation='softmax')(merged3)
# tie it together [image, seq] [word]
model = Model(inputs=[inputs1, inputs2, inputs3, inputs4], outputs=[outputs, outputs1])
model.compile(loss=['categorical_crossentropy', 'categorical_crossentropy'], optimizer='adam', metrics=['accuracy'])
print(model.summary())
# plot_model(model, show_shapes=True, to_file='model.png')
return model
I can initialize it correctly:
model = define_model(fvocab_size, fmax_length, wvocab_size, wmax_length)
Model: "model"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_1 (InputLayer) [(None, 4096)] 0
__________________________________________________________________________________________________
input_3 (InputLayer) [(None, 4096)] 0
__________________________________________________________________________________________________
dropout (Dropout) (None, 4096) 0 input_1[0][0]
__________________________________________________________________________________________________
dropout_1 (Dropout) (None, 4096) 0 input_3[0][0]
__________________________________________________________________________________________________
dense (Dense) (None, 256) 1048832 dropout[0][0]
__________________________________________________________________________________________________
input_2 (InputLayer) [(None, 34)] 0
__________________________________________________________________________________________________
dense_1 (Dense) (None, 256) 1048832 dropout_1[0][0]
__________________________________________________________________________________________________
input_4 (InputLayer) [(None, 21)] 0
__________________________________________________________________________________________________
repeat_vector (RepeatVector) (None, 34, 256) 0 dense[0][0]
__________________________________________________________________________________________________
embedding (Embedding) (None, 34, 256) 1940224 input_2[0][0]
__________________________________________________________________________________________________
repeat_vector_1 (RepeatVector) (None, 21, 256) 0 dense_1[0][0]
__________________________________________________________________________________________________
embedding_1 (Embedding) (None, 21, 256) 1428992 input_4[0][0]
__________________________________________________________________________________________________
concat1 (Concatenate) (None, 34, 512) 0 repeat_vector[0][0]
embedding[0][0]
__________________________________________________________________________________________________
concat2 (Concatenate) (None, 21, 512) 0 repeat_vector_1[0][0]
embedding_1[0][0]
__________________________________________________________________________________________________
lstm (LSTM) (None, 500) 2026000 concat1[0][0]
__________________________________________________________________________________________________
lstm_1 (LSTM) (None, 500) 2026000 concat2[0][0]
__________________________________________________________________________________________________
concat3 (Concatenate) (None, 1000) 0 lstm[0][0]
lstm_1[0][0]
__________________________________________________________________________________________________
dense_2 (Dense) (None, 7579) 7586579 concat3[0][0]
__________________________________________________________________________________________________
dense_3 (Dense) (None, 5582) 5587582 concat3[0][0]
==================================================================================================
Total params: 22,693,041
Trainable params: 22,693,041
Non-trainable params: 0
Input shapes of Concatenate is (None, 500), (None, 500) and output is (None, 1000). However, when passing actual data throuh generator, I get an error:
`InvalidArgumentError Traceback (most recent call last)
<ipython-input-15-e52b85d1307b> in <module>()
12
13 model.fit(train_generator, epochs=20, verbose=1, steps_per_epoch=steps, validation_steps=val_steps,
---> 14 callbacks=[checkpoint], validation_data=val_generator)
15
16 try:
6 frames
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_batch_size, validation_freq, max_queue_size, workers, use_multiprocessing)
1098 _r=1):
1099 callbacks.on_train_batch_begin(step)
-> 1100 tmp_logs = self.train_function(iterator)
1101 if data_handler.should_sync:
1102 context.async_wait()
/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/def_function.py in __call__(self, *args, **kwds)
826 tracing_count = self.experimental_get_tracing_count()
827 with trace.Trace(self._name) as tm:
--> 828 result = self._call(*args, **kwds)
829 compiler = "xla" if self._experimental_compile else "nonXla"
830 new_tracing_count = self.experimental_get_tracing_count()
/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/def_function.py in _call(self, *args, **kwds)
886 # Lifting succeeded, so variables are initialized and we can run the
887 # stateless function.
--> 888 return self._stateless_fn(*args, **kwds)
889 else:
890 _, _, _, filtered_flat_args = \
/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py in __call__(self, *args, **kwargs)
2941 filtered_flat_args) = self._maybe_define_function(args, kwargs)
2942 return graph_function._call_flat(
-> 2943 filtered_flat_args, captured_inputs=graph_function.captured_inputs) # pylint: disable=protected-access
2944
2945 @property
/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py in _call_flat(self, args, captured_inputs, cancellation_manager)
1917 # No tape is watching; skip to running the function.
1918 return self._build_call_outputs(self._inference_function.call(
-> 1919 ctx, args, cancellation_manager=cancellation_manager))
1920 forward_backward = self._select_forward_and_backward_functions(
1921 args,
/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py in call(self, ctx, args, cancellation_manager)
558 inputs=args,
559 attrs=attrs,
--> 560 ctx=ctx)
561 else:
562 outputs = execute.execute_with_cancellation(
/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/execute.py in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
58 ctx.ensure_initialized()
59 tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
---> 60 inputs, attrs, num_outputs)
61 except core._NotOkStatusException as e:
62 if name is not None:
InvalidArgumentError: All dimensions except 1 must match. Input 1 has shape [4 500] and doesn't match input 0 with shape [47 500].
[[node gradient_tape/model/concat3/ConcatOffset (defined at <ipython-input-15-e52b85d1307b>:14) ]] [Op:__inference_train_function_14543]
Function call stack:
train_function`
code of generator:
def create_sequences(tokenizer, max_length, desc_list, photo):
vocab_size = len(tokenizer.word_index) + 1
X1, X2, y = [], [], []
# walk through each description for the image
for desc in desc_list:
# encode the sequence
seq = tokenizer.texts_to_sequences([desc])[0]
# split one sequence into multiple X,y pairs
for i in range(1, len(seq)):
# split into input and output pair
in_seq, out_seq = seq[:i], seq[i]
# pad input sequence
in_seq = pad_sequences([in_seq], maxlen=max_length)[0]
# encode output sequence
out_seq = to_categorical([out_seq], num_classes=vocab_size)[0]
# store
X1.append(photo)
X2.append(in_seq)
y.append(out_seq)
return np.array(X1), np.array(X2), np.array(y)
def double_generator(descriptions1, photos1, tokenizer1, max_length1,
descriptions2, photos2, tokenizer2, max_length2, n_step=1):
while True:
# loop over photo identifiers in the dataset
keys1 = list(descriptions1.keys())
keys2 = list(descriptions2.keys()) # len(keys1) == len(keys2)
for i in range(0, len(keys1), n_step):
Ximages1, XSeq1, y1 = list(), list(),list()
Ximages2, XSeq2, y2 = list(), list(),list()
for j in range(i, min(len(keys1), i+n_step)):
image_id1 = keys1[j]
# retrieve the photo feature
photo1 = photos1[image_id1][0]
desc_list1 = descriptions1[image_id1]
# print(desc_list)
in_img1, in_seq1, out_word1 = create_sequences(tokenizer1, max_length1, desc_list1, photo1)
# print(in_img, in_seq, out_word)
for k in range(len(in_img1)):
Ximages1.append(in_img1[k])
XSeq1.append(in_seq1[k])
y1.append(out_word1[k])
# print('Ximages1', Ximages1)
# print('Xseq1', XSeq1)
# print('y1', y1)
for j in range(i, min(len(keys2), i+n_step)):
image_id2 = keys2[j]
# retrieve the photo feature
photo2 = photos2[image_id2][0]
desc_list2 = descriptions2[image_id2]
# print(desc_list)
in_img2, in_seq2, out_word2 = create_sequences(tokenizer2, max_length2, desc_list2, photo2)
# print(in_img, in_seq, out_word)
for k in range(len(in_img2)):
Ximages2.append(in_img2[k])
XSeq2.append(in_seq2[k])
y2.append(out_word2[k])
# print('Ximages2', Ximages2)
# print('Xseq2', XSeq2)
# print('y2', y2)
yield ([np.array(Ximages1), np.array(XSeq1), np.array(Ximages2), np.array(XSeq2)], [np.array(y1), np.array(y2)])
Everything works fine when there is only one dataset and no lstms concatenation(with simple image captioning)
Shapes of inputs in error change when I call next(generator) and as I understad corellate with description length, though I use padding.
Keras tutorial on functional api contains similar to mine example called Manipulate complex graph topologies https://keras.io/guides/functional_api/ which also works with lstms concatenation, and I don't see why it's not working in my case without any reshaping.
I tried:
Thanks in advance
Upvotes: 0
Views: 1363
Reputation: 19307
You are trying to send 47 samples and 4 samples for different inputs via the generator at the same time. The neural network is throwing an error because you are passing them via the first channel none
which can take variable batch sizes. But when the tensor shaped (47,500) and (4, 500) from the 2 lstms, reaches the concatenate layer, the layer is not able to concatenate them over the first axis, as expected. So you get an error while training and not while compiling.
If you are trying to generate a single sample (1 row of data) at a time via your generator, then perhaps you have 2D inputs shaped (47,4096) and (4,4096). In this case, you should reshape them to (1,47,4096) and (1,4,4096). This would change your architecture completely but would be inline with what I think you are trying to do.
The issue is that you are passing different sized batches as inputs to the model. This is because the first channel none
takes the batch size.
Let's look at what happens in your model step by step only for 2 inputs (Ximages1, and Ximages2).
You first pass (for each batch from the generator)
Input Layer -
input_1 (InputLayer) [(None, 4096)] #(47, 4096) Ximages1
input_3 (InputLayer) [(None, 4096)] #(4, 4096) Ximages2
These go into intermediate layers until they reach the individual LSTMs.
LSTM Layers -
lstm (LSTM) (None, 500) concat1[0][0] #(47, 500)
lstm_1 (LSTM) (None, 500) concat2[0][0] #(4, 500)
Now the next layer, concatenate tries to combine the 2 layers into a single one as -
concat3 (Concatenate) (None, 1000) lstm[0][0] #(47, 500)
lstm_1[0][0] #(4, 500)
From an architecture point of view, it can concatenate (none, 500)
and the second (none, 500)
over the first channel (batch_size), however, the assumption being that the same number of samples are received by the layer for each batch.
In other words, you can't concatenate a (47, 500)
with a (4,500)
over the first axis.
Upvotes: 3