Reputation: 323
I'm really stuck building a NN for text-classification with keras
using lstm
and adding an attention_layer
on top. Im sure Iam pretty close, but Im confused:
Do I have to add a TimeDistributed
dense layer after LSTM?
And, how do I retrieve the Attention weights from my network (for visualization purpose)? - so that I know which sentence was 'responsible' that the document was classified as good or bad?
Say, I have 10 documents consisting of 100 sentences and each sentence is represented as a 500 element vector. So my documents matrix containing the sentence-sequences looks like: X = np.array(Matrix).reshape(10, 100, 500)
The documents should be classified to an according sentiment 1=good; 0=bad - so
y= [1,0,0,1,1]
yy= np.array(y)
I dont need an embedding-layer cause each sentence of each document is already a sparse-vector.
The attention layer is taken from: https://github.com/richliao/textClassifier/blob/master/textClassifierHATT.py
MAX_SENTS = 100
MAX_SENT_LENGTH = 500
review_input = Input(shape=(MAX_SENTS, MAX_SENT_LENGTH))
l_lstm_sent = LSTM(100, activation='tanh', return_sequences=True)(review_input)
l_att_sent = AttLayer(100)(l_lstm_sent)
preds = Dense(1, activation='softmax')(l_att_sent)
model = Model(review_input, preds)
model.compile(loss='binary_crossentropy',
optimizer='rmsprop',
metrics=['acc'])
model.fit(X, yy, nb_epoch=10, batch_size=50)
So I think my model should be set up correctly but Im not quite sure.. But how do I get the attention-weights from that (e.g. so I know which sentence caused a classification as 1)? Help so much appreciated
Upvotes: 0
Views: 564
Reputation: 11213
1. Time distributed
In this case, you don't have to wrap Dense
into TimeDistributed
, although it may be a little bit faster if you do, especially if you can provide a mask that masks out a large part of the LSTM output.
However, Dense
operates in the last dimension no matter what the shape before the last dimension is.
2. Attention weights
Yes, it is as you suggest in the comment. You need to modify the AttLayer
it is capable of returning both its output and the attention weights.
return output, ait
And then create a model that contains both prediction and attention weight tensors and get the predictions for them:
l_att_sent, l_att_sent = AttLayer(100)(l_lstm_sent)
...
predictions, att_weights = attmodel.predict(X)
Upvotes: 1