The_Dude
The_Dude

Reputation: 39

LSTM error - 'logits and labels must have the same shape'

I have searched the other threads on the here regarding the error, but am unable to figure out the issue. I am trying to create an LSTM using a toy dataset with two predictors and three outcomes, setting the output layer to sigmoid such that each outcome label is afford a probability between 0-1. My code:

import pandas as pd
import numpy as np
import tensorflow as tf

#create toy dataset
d = {'custID': [101,101,101,102,102,102,103,103,103],
     'X1': [1,1,1,0,0,0,1,1,0],
     'X2': [0,1,0,1,1,1,1,1,0],
     'y1': [0,0,1,0,1,1,0,0,1],
     'y2': [0,1,1,0,0,1,1,0,1],
     'y3':[0,0,0,0,0,1,0,1,0]}

data = pd.DataFrame(data=d)

#seperate preds (X) from outcome (y)
X = data[['custID','X1','X2']]
y = data[['custID','y1','y2','y3']]

X.set_index('custID', inplace=True)
y.set_index('custID', inplace=True)

#reshape
X = X.values.reshape(3,3,2)
print(X)

y = y.values.reshape(3,3,3)
print(y)

#create LSTM
model = tf.keras.models.Sequential([
    tf.keras.layers.Dense(6, input_shape=(3,2)),
    tf.keras.layers.LSTM(12, return_sequences = False),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(3, activation='sigmoid')
])

model.compile(optimizer=tf.keras.optimizers.Adam(),loss='binary_crossentropy')
model.fit(X,y, epochs=10)

This yields the error (*ValueError: logits and labels must have the same shape ((None, 3) vs (None, 3, 3))). It works if I set the output layer activation to softmax, but that yields a probability across all 3 labels that sum to 1, instead of a probability for each label.

Upvotes: 0

Views: 943

Answers (1)

Anton Panchishin
Anton Panchishin

Reputation: 3763

Fixing the 'same shape' issue

I think you want to change your model to this

#create LSTM
model = tf.keras.models.Sequential([
    tf.keras.layers.Dense(6, input_shape=(3,2)),
    tf.keras.layers.LSTM(12, return_sequences=True),
    tf.keras.layers.Dense(3, activation='sigmoid')
])

and if we look at the model summary

model.summary()

We get the following output

Layer (type)                 Output Shape              Param #   
=================================================================
dense_16 (Dense)             (None, 3, 6)              18        
_________________________________________________________________
lstm_8 (LSTM)                (None, 3, 12)             912       
_________________________________________________________________
dense_17 (Dense)             (None, 3, 3)              39        
=================================================================
Total params: 969
Trainable params: 969
Non-trainable params: 0

We can see that it really looks like the data is flowing in the correct shape

Showing that this model works

Let's take the above model and train it

model.compile(optimizer=tf.keras.optimizers.Adam(),loss='binary_crossentropy')
dataSet = tf.data.Dataset.from_tensors((X,y)).repeat(1000)
model.fit(dataSet, epochs=5)

Returns the following training results

Epoch 1/5
1000/1000 [==============================] - 2s 2ms/step - loss: 0.2522
Epoch 2/5
1000/1000 [==============================] - 2s 2ms/step - loss: 0.0142
Epoch 3/5
1000/1000 [==============================] - 2s 2ms/step - loss: 0.0048
Epoch 4/5
1000/1000 [==============================] - 2s 2ms/step - loss: 0.0023
Epoch 5/5
1000/1000 [==============================] - 2s 2ms/step - loss: 0.0012

Looks like it trains up Fantastic!

I think the main issue was the return_sequence=False, which basically got rid of your second dimension of data. The Flatten was unnecessary, and if you fixed the return_sequence the Flatten then caused another issue.

Let me know if this solves your problem

Upvotes: 1

Related Questions