Reputation: 85
I've been trying to implement this BiLSTM in Keras: https://github.com/ffancellu/NegNN
Here is where I'm at, and it kind of works:
inputs_w = Input(shape=(sequence_length,), dtype='int32')
inputs_pos = Input(shape=(sequence_length,), dtype='int32')
inputs_cue = Input(shape=(sequence_length,), dtype='int32')
w_emb = Embedding(vocabulary_size+1, embedding_dim, input_length=sequence_length, trainable=False)(inputs_w)
p_emb = Embedding(tag_voc_size+1, embedding_dim, input_length=sequence_length, trainable=False)(inputs_pos)
c_emb = Embedding(2, embedding_dim, input_length=sequence_length, trainable=False)(inputs_cue)
summed = keras.layers.add([w_emb, p_emb, c_emb])
BiLSTM = Bidirectional(CuDNNLSTM(hidden_dims, return_sequences=True))(summed)
DPT = Dropout(0.2)(BiLSTM)
outputs = Dense(2, activation='softmax')(DPT)
checkpoint = ModelCheckpoint('bilstm_one_hot.hdf5', monitor='val_loss', verbose=1, save_best_only=True, mode='auto')
early = EarlyStopping(monitor='val_loss', min_delta=0.0001, patience=5, verbose=1, mode='auto')
model = Model(inputs=[inputs_w, inputs_pos, inputs_cue], outputs=outputs)
model.compile('adam', loss='categorical_crossentropy', metrics=['accuracy'])
model.summary()
model.fit([X_train, X_pos_train, X_cues_train], Y_train, batch_size=batch_size, epochs=num_epochs, verbose=1, validation_split=0.2, callbacks=[early, checkpoint])
In the original code, in Tensorflow, the author uses masking and softmax cross entropy with logits. I don't get how to implement this in Keras yet. If you have any advice don't hesitate.
My main issue here is with return_sequences=True. The author doesn't appear to be using it in his tensorflow implementation and when I turn it to False, I get this error:
ValueError: Error when checking target: expected dense_1 to have 2 dimensions, but got array with shape (820, 109, 2)
I also tried using:
outputs = TimeDistributed(Dense(2, activation='softmax'))(BiLSTM)
which returns and AssertionError without any information.
Any ideas ?
Thanks
Upvotes: 0
Views: 832
Reputation: 6002
the author uses masking and softmax cross entropy with logits. I don't get how to implement this in Keras yet.
Regarding softmax cross entropy with logits, you are doing it correctly. softmax_cross_entropy_with_logits
as the loss function + no activation function on the last layer is the same as your approach with categorical_crossentropy
as loss + softmax
activation on the last layer. The only difference is that the latter one is numerically less stable. If this turns out to be an issue for you, you can (if your Keras backend is tensorflow) just pass tf.softmax_cross_entropy_with_logits
as your loss. If you have another backend, you will have to look for an equivalent there.
Regarding masking, I'm not sure if I fully understand what the author is doing. However, in Keras the Embedding
layer has a mask_zero
parameter that you can set to True
. In that case all timesteps that have a 0
will be ignored in all further calculations. In your source, it is not 0
that is being masked, though, so you would have to adjust the indices accordingly. If that doesn't work, there is the Masking
layer in Keras that you can put before your recurrent layer, but I have little experience with that.
My main issue here is with return_sequences=True. The author doesn't appear to be using it
What makes you think that he doesn't use it? Just because that keyword does not appear in the code doesn't mean anything. But I'm also not sure. The code is pretty old and I don't find it in the docs anymore that could tell what the defaults are.
Anyway, if you want to use return_sequences=False
(for whatever reason) be aware that this changes the output shape of the layer:
return_sequences=True
the output shape is (batch_size, timesteps, features)
return_sequences=False
the output shape is (batch_size, features)
The error you are getting is basically telling you that your network's output has one dimension less than the target y
values you are feeding it.
So, to me it looks like return_sequences=True
is just what you need, but without further information it is hard to tell.
Then, regarding TimeDistributed. I'm not quite sure what you are trying to achieve with it, but quoting from the docs:
This wrapper applies a layer to every temporal slice of an input.
The input should be at least 3D, and the dimension of index one will be considered to be the temporal dimension.
(emphasis is mine)
I'm not sure from your question, in which scenario the empty assertion occurs.
If you have a recurrent layer with return_sequences=False
before, you are again missing a dimension (I can't tell you why the assertion is empty, though).
If you have a recurrent layer with return_sequences=True
before, it should work, but it would be completely useless, as Dense
is applied in a time distributed way anyways. If I'm not mistaken, this behavior of the Dense
layer was changed in some older Keras version (they should really update the example there and stop using Dense
!). As the code you are referring to is quite old, it's well possible that TimeDistributed
was needed back then, but is not needed anymore.
If your plan was to restore the missing dimension, TimeDistributed
won't help you, but RepeatVector
would. But, as already said, in that case better use return_sequences=True
in the first place.
Upvotes: 1
Reputation: 941
The problem is that your target values seem to be time distributed. So you have 109 timesteps with a onehot target vector of size two. This is why you need the return_sequences=True. Otherwise you will just feed the last timestep to the Dense layer and you would just have one output.
So depending on what you need you need to keep it like it is now or if just the last timestep is enough for you you can get rid of it, but then you would need to adjust the y values accordingly.
Upvotes: 1