Felix
Felix

Reputation: 323

LSTM with Attention getting weights?? Classifing documents based on sentence embedding

I'm really stuck building a NN for text-classification with keras using lstm and adding an attention_layer on top. Im sure Iam pretty close, but Im confused:

Say, I have 10 documents consisting of 100 sentences and each sentence is represented as a 500 element vector. So my documents matrix containing the sentence-sequences looks like: X = np.array(Matrix).reshape(10, 100, 500)

The documents should be classified to an according sentiment 1=good; 0=bad - so

y= [1,0,0,1,1]
yy= np.array(y)

I dont need an embedding-layer cause each sentence of each document is already a sparse-vector.

The attention layer is taken from: https://github.com/richliao/textClassifier/blob/master/textClassifierHATT.py

MAX_SENTS = 100
MAX_SENT_LENGTH = 500

review_input = Input(shape=(MAX_SENTS, MAX_SENT_LENGTH))
l_lstm_sent = LSTM(100, activation='tanh', return_sequences=True)(review_input)
l_att_sent = AttLayer(100)(l_lstm_sent)
preds = Dense(1, activation='softmax')(l_att_sent)
model = Model(review_input, preds)

model.compile(loss='binary_crossentropy',
              optimizer='rmsprop',
              metrics=['acc'])
model.fit(X, yy, nb_epoch=10, batch_size=50)

So I think my model should be set up correctly but Im not quite sure.. But how do I get the attention-weights from that (e.g. so I know which sentence caused a classification as 1)? Help so much appreciated

Upvotes: 0

Views: 564

Answers (1)

Jindřich
Jindřich

Reputation: 11213

1. Time distributed

In this case, you don't have to wrap Dense into TimeDistributed, although it may be a little bit faster if you do, especially if you can provide a mask that masks out a large part of the LSTM output.

However, Dense operates in the last dimension no matter what the shape before the last dimension is.

2. Attention weights

Yes, it is as you suggest in the comment. You need to modify the AttLayer it is capable of returning both its output and the attention weights.

return output, ait

And then create a model that contains both prediction and attention weight tensors and get the predictions for them:

l_att_sent, l_att_sent = AttLayer(100)(l_lstm_sent)
...
predictions, att_weights = attmodel.predict(X)

Upvotes: 1

Related Questions