user9170
user9170

Reputation: 1000

First timer problems, multidimensional Output

I am new to Keras and LSTM but not new to other NNs. The goal is to classify parts of a sentence into 4 mutually exclusive categories. I want to use the LTSM to score the words of the sentence to the labels it is assigned to using the context around the word. Ex:

[carrots peas beef chicken coward chicken ]

to:

[ food food food food person person ]

My input vectors are arrays of words which I run through an Embedding. I then want to feed the LSTM with the words, and train it on the output classification, so that it will learn what words in what contexts fall into which classes.

    x_in = [[2,6,3,74,45,...], [...], ...]
    y_in = [[0,0,0,1], [0,1,0,0], [...], ...]
    x_in_padded =  pad_sequences(x_in, maxlen=max_len, padding='post')
    model = Sequential()
    model.add(Embedding(len(words), 128))  # examples use 128 or 256...
    model.add(LSTM(4,  return_sequences=True))   #try a 4-word context
    model.compile(loss='binary_crossentropy', optimizer='rmsprop', metrics=['accuracy'])

    print model.summary()
    model.fit(x_in_padded, y_in, batch_size=16, epochs=10)
    loss, accuracy = model.evaluate(x_in_padded, y_in, batch_size=16)

However I am getting:

ValueError: Error when checking target: 
expected dense_1 to have 3 dimensions, but got array with shape (968, 1)

968 is the number of sentences in x_in_padded and vectors in y_in. What am I doing wrong?

** UPDATE **

I've been iterating on it and I still have issues with the embedding and LSTM layer dimensionality. Here's the code, I've made it a self-contained example. The current error is: ValueError: setting an array element with a sequence. after it starts the Epoch.

from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten
from keras.layers import Embedding
from keras.layers import LSTM
from keras.utils import to_categorical
from keras.preprocessing.sequence import pad_sequences
from keras.utils.layer_utils import print_summary
import numpy as np

sentences = [
    ['the', 'imdb', 'review', 'data', 'does', 'have', 'a', 'one', 'dimensional', 'spatial', 'structure', 'in', 'the', 'sequence'],
    ['of', 'words', 'in', 'reviews', 'and', 'the', 'cnn', 'may', 'be', 'able', 'to', 'pick', 'out'],
    ['invariant', 'features', 'for', 'good', 'and', 'bad', 'sentiment', 'this', 'learned', 'spatial'],
    ['features', 'may', 'then', 'be', 'learned', 'as', 'sequences', 'by', 'an', 'lstm', 'layer']
]
    outputs = [
    ['class1', 'class1', 'class1', 'class1', 'class1', 'class1', 'class1', 'class1', 'class1', 'class1', 'class1', 'class2', 'class2', 'class2'],
    ['class2', 'class2', 'class2', 'class2', 'class1', 'class1', 'class1', 'class1', 'class1', 'class1', 'class2', 'class2', 'class2'],
    ['class1', 'class1', 'class2', 'class2', 'class2', 'class2', 'class2', 'class1', 'class1', 'class1'],
    ['class1', 'class1', 'class1', 'class1', 'class1', 'class1', 'class1', 'class2', 'class2', 'class2', 'class2']
]

words = sorted(list(words))
x_in = [ [words.index(word) for word in sentence] for sentence in sentences ]

out_classes = {'class1': [1,0], 'class2': [0,1]}
y_in = [ [ out_classes[sentence[i]] for i in range(len(sentence))] for sentence in outputs]

max_len = max([len(sentence) for sentence in sentences])

x_in_padded =  pad_sequences(x_in, maxlen=max_len, padding='post')
x_in_padded = np.reshape(x_in_padded, (x_in_padded.shape[0], x_in_padded.shape[1], 1)) #x_in_padded.shape +(1,))
print "x_in:"
print x_in_padded[2]
print x_in_padded.shape

print "y_in:"
y_in = np.array(y_in)

model = Sequential()
model.add(LSTM(4, return_sequences=False, input_shape=(None, 1)))
model.add(Dense(len(out_classes),  activation="softmax"))
model.compile(loss='sparse_categorical_crossentropy', optimizer='rmsprop', metrics=['accuracy'])

print_summary(model)
model.fit(x_in_padded, y_in, epochs=10)
loss, accuracy = model.evaluate(x_in_padded, y_in)

My current error is: ValueError: setting an array element with a sequence.

Upvotes: 1

Views: 328

Answers (1)

Marcin Możejko
Marcin Możejko

Reputation: 40506

Two things here:

model.add(LSTM(4, return_sequences=True))   #try a 4-word context

This line outputs number passed through a default tanh activation. This is not plausible with your task. Add softmax output from Dense layer on top of your network:

model.add(LSTM(10, return_sequences=True))   #10 is arbitrary - try other values.
model.add(Dense(4, activation='softmax')) # Output layer.

Another thing is your loss. For multiclass classification task, you should use some variation of categorical_crossentropy. In case - when your target is a sequence of ints - you should use sparse_categorical_crossentropy.

model.compile(loss='sparse_categorical_crossentropy', optimizer='rmsprop', metrics=['accuracy'])

Upvotes: 1

Related Questions