Reputation: 1000
I am new to Keras and LSTM but not new to other NNs. The goal is to classify parts of a sentence into 4 mutually exclusive categories. I want to use the LTSM to score the words of the sentence to the labels it is assigned to using the context around the word. Ex:
[carrots peas beef chicken coward chicken ]
to:
[ food food food food person person ]
My input vectors are arrays of words which I run through an Embedding. I then want to feed the LSTM with the words, and train it on the output classification, so that it will learn what words in what contexts fall into which classes.
x_in = [[2,6,3,74,45,...], [...], ...]
y_in = [[0,0,0,1], [0,1,0,0], [...], ...]
x_in_padded = pad_sequences(x_in, maxlen=max_len, padding='post')
model = Sequential()
model.add(Embedding(len(words), 128)) # examples use 128 or 256...
model.add(LSTM(4, return_sequences=True)) #try a 4-word context
model.compile(loss='binary_crossentropy', optimizer='rmsprop', metrics=['accuracy'])
print model.summary()
model.fit(x_in_padded, y_in, batch_size=16, epochs=10)
loss, accuracy = model.evaluate(x_in_padded, y_in, batch_size=16)
However I am getting:
ValueError: Error when checking target:
expected dense_1 to have 3 dimensions, but got array with shape (968, 1)
968 is the number of sentences in x_in_padded and vectors in y_in. What am I doing wrong?
** UPDATE **
I've been iterating on it and I still have issues with the embedding and LSTM layer dimensionality. Here's the code, I've made it a self-contained example. The current error is: ValueError: setting an array element with a sequence.
after it starts the Epoch.
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten
from keras.layers import Embedding
from keras.layers import LSTM
from keras.utils import to_categorical
from keras.preprocessing.sequence import pad_sequences
from keras.utils.layer_utils import print_summary
import numpy as np
sentences = [
['the', 'imdb', 'review', 'data', 'does', 'have', 'a', 'one', 'dimensional', 'spatial', 'structure', 'in', 'the', 'sequence'],
['of', 'words', 'in', 'reviews', 'and', 'the', 'cnn', 'may', 'be', 'able', 'to', 'pick', 'out'],
['invariant', 'features', 'for', 'good', 'and', 'bad', 'sentiment', 'this', 'learned', 'spatial'],
['features', 'may', 'then', 'be', 'learned', 'as', 'sequences', 'by', 'an', 'lstm', 'layer']
]
outputs = [
['class1', 'class1', 'class1', 'class1', 'class1', 'class1', 'class1', 'class1', 'class1', 'class1', 'class1', 'class2', 'class2', 'class2'],
['class2', 'class2', 'class2', 'class2', 'class1', 'class1', 'class1', 'class1', 'class1', 'class1', 'class2', 'class2', 'class2'],
['class1', 'class1', 'class2', 'class2', 'class2', 'class2', 'class2', 'class1', 'class1', 'class1'],
['class1', 'class1', 'class1', 'class1', 'class1', 'class1', 'class1', 'class2', 'class2', 'class2', 'class2']
]
words = sorted(list(words))
x_in = [ [words.index(word) for word in sentence] for sentence in sentences ]
out_classes = {'class1': [1,0], 'class2': [0,1]}
y_in = [ [ out_classes[sentence[i]] for i in range(len(sentence))] for sentence in outputs]
max_len = max([len(sentence) for sentence in sentences])
x_in_padded = pad_sequences(x_in, maxlen=max_len, padding='post')
x_in_padded = np.reshape(x_in_padded, (x_in_padded.shape[0], x_in_padded.shape[1], 1)) #x_in_padded.shape +(1,))
print "x_in:"
print x_in_padded[2]
print x_in_padded.shape
print "y_in:"
y_in = np.array(y_in)
model = Sequential()
model.add(LSTM(4, return_sequences=False, input_shape=(None, 1)))
model.add(Dense(len(out_classes), activation="softmax"))
model.compile(loss='sparse_categorical_crossentropy', optimizer='rmsprop', metrics=['accuracy'])
print_summary(model)
model.fit(x_in_padded, y_in, epochs=10)
loss, accuracy = model.evaluate(x_in_padded, y_in)
My current error is: ValueError: setting an array element with a sequence.
Upvotes: 1
Views: 328
Reputation: 40506
Two things here:
model.add(LSTM(4, return_sequences=True)) #try a 4-word context
This line outputs number passed through a default tanh
activation. This is not plausible with your task. Add softmax
output from Dense
layer on top of your network:
model.add(LSTM(10, return_sequences=True)) #10 is arbitrary - try other values.
model.add(Dense(4, activation='softmax')) # Output layer.
Another thing is your loss. For multiclass classification task, you should use some variation of categorical_crossentropy
. In case - when your target is a sequence of int
s - you should use sparse_categorical_crossentropy
.
model.compile(loss='sparse_categorical_crossentropy', optimizer='rmsprop', metrics=['accuracy'])
Upvotes: 1