Reputation: 2104
I'm working with a dataset in which batch items are texts represented by matrices with shape (max_sentences_per_text, max_tokens_per_sentence). It goes through an embedding layer (becoming 3d) and then a time distributed LSTM that outputs one vector for each sentence (back to 2d). Then, a second LSTM layer reads all the sentence vectors and outputs a final vector for each batch item, which can go through normal dense layers.
This is illustrated below (generated with keras.utils.plot_model
), with 85 sentences per text and 40 tokens per sentence:
Here is the model code:
inputs = Input([num_sentences, max_sentence_size])
vocab_size, embedding_size = embeddings.shape
init = initializers.constant(embeddings)
emb_layer = Embedding(vocab_size, embedding_size, mask_zero=True,
embeddings_initializer=init)
emb_layer.trainable = False
embedded = emb_layer(inputs)
projection_layer = Dense(lstm1_units, activation=None, use_bias=False,
name='projection')
projected = projection_layer(embedded)
lstm1 = LSTM(lstm1_units, name='token_lstm')
sentence_vectors = TimeDistributed(lstm1)(projected)
lstm2 = LSTM(lstm2_units, name='sentence_lstm')
final_vector = lstm2(sentence_vectors)
hidden = Dense(hidden_units, activation='relu', name='hidden')(final_vector)
scores = Dense(num_scores, activation='sigmoid', name='scorer')(hidden)
model = keras.models.Model(inputs, scores)
This looks fine for me, except I'm having the following error:
Traceback (most recent call last):
File "src/network.py", line 43, in <module>
network.fit(x, y, validation_data=(xval, yval))
File "/Users/erick/anaconda2/lib/python2.7/site-packages/keras/engine/training.py", line 1507, in fit
initial_epoch=initial_epoch)
File "/Users/erick/anaconda2/lib/python2.7/site-packages/keras/engine/training.py", line 1156, in _fit_loop
outs = f(ins_batch)
File "/Users/erick/anaconda2/lib/python2.7/site-packages/keras/backend/tensorflow_backend.py", line 2269, in __call__
**self.session_kwargs)
File "/Users/erick/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 767, in run
run_metadata_ptr)
File "/Users/erick/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 965, in _run
feed_dict_string, options, run_metadata)
File "/Users/erick/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1015, in _do_run
target_list, options, run_metadata)
File "/Users/erick/anaconda2/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1035, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Inputs to operation sentence_lstm/while/Select_2 of type Select must have the same size and shape. Input 0: [32,4000] != input 1: [32,100]
[[Node: sentence_lstm/while/Select_2 = Select[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](sentence_lstm/while/Tile, sentence_lstm/while/add_5, sentence_lstm/while/Identity_3)]]
Caused by op u'sentence_lstm/while/Select_2', defined at:
File "src/network.py", line 37, in <module>
args.hidden_units)
File "src/model.py", line 51, in create_model
final_vector = lstm2(sentence_vectors)
File "/Users/erick/anaconda2/lib/python2.7/site-packages/keras/layers/recurrent.py", line 262, in __call__
return super(Recurrent, self).__call__(inputs, **kwargs)
File "/Users/erick/anaconda2/lib/python2.7/site-packages/keras/engine/topology.py", line 596, in __call__
output = self.call(inputs, **kwargs)
File "/Users/erick/anaconda2/lib/python2.7/site-packages/keras/layers/recurrent.py", line 341, in call
input_length=input_shape[1])
File "/Users/erick/anaconda2/lib/python2.7/site-packages/keras/backend/tensorflow_backend.py", line 2538, in rnn
swap_memory=True)
File "/Users/erick/anaconda2/lib/python2.7/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2605, in while_loop
result = context.BuildLoop(cond, body, loop_vars, shape_invariants)
File "/Users/erick/anaconda2/lib/python2.7/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2438, in BuildLoop
pred, body, original_loop_vars, loop_vars, shape_invariants)
File "/Users/erick/anaconda2/lib/python2.7/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2388, in _BuildLoop
body_result = body(*packed_vars_for_body)
File "/Users/erick/anaconda2/lib/python2.7/site-packages/keras/backend/tensorflow_backend.py", line 2509, in _step
new_states = [tf.where(tiled_mask_t, new_states[i], states[i]) for i in range(len(states))]
File "/Users/erick/anaconda2/lib/python2.7/site-packages/tensorflow/python/ops/array_ops.py", line 2301, in where
return gen_math_ops._select(condition=condition, t=x, e=y, name=name)
File "/Users/erick/anaconda2/lib/python2.7/site-packages/tensorflow/python/ops/gen_math_ops.py", line 2386, in _select
name=name)
File "/Users/erick/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 763, in apply_op
op_def=op_def)
File "/Users/erick/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2327, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/Users/erick/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1226, in __init__
self._traceback = _extract_stack()
InvalidArgumentError (see above for traceback): Inputs to operation sentence_lstm/while/Select_2 of type Select must have the same size and shape. Input 0: [32,4000] != input 1: [32,100]
[[Node: sentence_lstm/while/Select_2 = Select[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"](sentence_lstm/while/Tile, sentence_lstm/while/add_5, sentence_lstm/while/Identity_3)]]
The training call is network.fit(x, y, validation_data=(xval, yval))
, with the following shapes:
In [89]: x.shape
Out[89]: (1000, 85, 40)
In [90]: y.shape
Out[90]: (1000, 5)
In [91]: xval.shape
Out[91]: (500, 85, 40)
In [92]: yval.shape
Out[92]: (500, 5)
Upvotes: 0
Views: 902
Reputation: 4489
Moved from question:
UPDATE: after much searching around, I found that the problem is that TimeDistributed doesn't work with masking. I could make the model run wrapping the embedding layer call with TimeDistributed(emb_layer)(inputs), but that would disable masking for the whole model.
This is a known issue with Keras but still without plans for a solution:
https://github.com/fchollet/keras/issues/4786 https://github.com/fchollet/keras/issues/3030
Upvotes: 1
Reputation: 4348
Ok, I think I found the error.
final_vector = lstm2(sentence_vectors)
should be
final vector = (lstm2)(sentence_vectors)
Otherwise you are calling lstm2 as a function with sentence_vectors
as arguments.
Upvotes: 0