Sam B.
Sam B.

Reputation: 3033

Expected activation_1 to have 3 dimensions, but got array with shape (12 6984, 67)

Writing a model to try using LSTM to generate realistic text based on examples.

Here's the gist of the code

# ...
path = 'lyrics.txt'
with io.open(path, encoding='utf-8') as f:
    text = f.read().lower()
print('corpus length:', len(text))

chars = sorted(list(set(text)))
print('total chars:', len(chars))
char_indices = dict((c, i) for i, c in enumerate(chars))
indices_char = dict((i, c) for i, c in enumerate(chars))

# cut the text in semi-redundant sequences of maxlen characters
maxlen = 140

step = 3
sentences = []
next_chars = []
for i in range(0, len(text) - maxlen, step):
    sentences.append(text[i: i + maxlen])
    next_chars.append(text[i + maxlen])
print('nb sequences:', len(sentences))

print('Vectorization...')
x = np.zeros((len(sentences), maxlen, len(chars)), dtype=np.bool)
y = np.zeros((len(sentences), len(chars)), dtype=np.bool)
for i, sentence in enumerate(sentences):
    for t, char in enumerate(sentence):
        x[i, t, char_indices[char]] = 1
    y[i, char_indices[next_chars[i]]] = 1


# build the model: a single LSTM
print('Build model...')
model = Sequential()
model.add(LSTM(128, dropout_W=0.5, return_sequences=True, input_shape=(maxlen, len(chars))))
model.add(LSTM(128, dropout_W=0.5, return_sequences=True))
model.add(LSTM(128, dropout_W=0.5, return_sequences=True))
model.add(Dense(len(chars)))
model.add(Activation('softmax'))

model.compile(loss='categorical_crossentropy', optimizer='adam')

edited it to try and see the result with stacking multiple LSTM, in return getting this error

Using TensorFlow backend.
corpus length: 381090
total chars: 67
nb sequences: 126984
Vectorization...
Build model...
char_lstm.py:55: UserWarning: Update your `LSTM` call to the Keras 2 API: `LSTM(128, return_sequences=True, drop
out=0.5, input_shape=(140, 67))`
  model.add(LSTM(128, dropout_W=0.5, return_sequences=True, input_shape=(maxlen, len(chars))))
char_lstm.py:56: UserWarning: Update your `LSTM` call to the Keras 2 API: `LSTM(128, return_sequences=True, drop
out=0.5)`
  model.add(LSTM(128, dropout_W=0.5, return_sequences=True))
char_lstm.py:57: UserWarning: Update your `LSTM` call to the Keras 2 API: `LSTM(128, return_sequences=True, drop
out=0.5)`
  model.add(LSTM(128, dropout_W=0.5, return_sequences=True))
Traceback (most recent call last):
  File "char_lstm.py", line 110, in <module>
    callbacks=[print_callback])
  File "/usr/local/lib/python2.7/dist-packages/keras/models.py", line 1002, in fit
    validation_steps=validation_steps)
  File "/usr/local/lib/python2.7/dist-packages/keras/engine/training.py", line 1630, in fit
    batch_size=batch_size)
  File "/usr/local/lib/python2.7/dist-packages/keras/engine/training.py", line 1480, in _standardize_user_data
    exception_prefix='target')
  File "/usr/local/lib/python2.7/dist-packages/keras/engine/training.py", line 113, in _standardize_input_data
    'with shape ' + str(data_shape))
ValueError: Error when checking target: expected activation_1 to have 3 dimensions, but got array with shape (12
6984, 67)

Believe possibly the last layer model.add(Dense(len(chars))) might be the source of the bug, I know what the code does. But after multiple shots in the dark need to find an adequate solution and more importantly an understanding of how the solution links to the bug.

Upvotes: 0

Views: 189

Answers (1)

nuric
nuric

Reputation: 11225

You are close, the problem is around Dense(len(chars)) because you use return_sequences=True in the last LSTM as well, you are effectively returning a 3D tensor of shape (batch_size, maxlen, 128). Now both Dense and softmax can handle higher dimension tensors, they operate on the last dimension axis=-1 but this causes them to return sequences as well. You have a many-to-many model, whereas your data is many-to-one. You have 2 options:

  1. You can remove returning sequences from last LSTM to compress the context, past tokens into a single vector representation of size 128 then predict based on that.
  2. If you insist that you want information from all past words, then you need Flatten() before you pass to Dense to predict.

By the way you can use Dense(len(chars), activation='softmax') to achieve the same effect in one line.

Upvotes: 1

Related Questions