Reputation: 39
I have searched the other threads on the here regarding the error, but am unable to figure out the issue. I am trying to create an LSTM using a toy dataset with two predictors and three outcomes, setting the output layer to sigmoid such that each outcome label is afford a probability between 0-1. My code:
import pandas as pd
import numpy as np
import tensorflow as tf
#create toy dataset
d = {'custID': [101,101,101,102,102,102,103,103,103],
'X1': [1,1,1,0,0,0,1,1,0],
'X2': [0,1,0,1,1,1,1,1,0],
'y1': [0,0,1,0,1,1,0,0,1],
'y2': [0,1,1,0,0,1,1,0,1],
'y3':[0,0,0,0,0,1,0,1,0]}
data = pd.DataFrame(data=d)
#seperate preds (X) from outcome (y)
X = data[['custID','X1','X2']]
y = data[['custID','y1','y2','y3']]
X.set_index('custID', inplace=True)
y.set_index('custID', inplace=True)
#reshape
X = X.values.reshape(3,3,2)
print(X)
y = y.values.reshape(3,3,3)
print(y)
#create LSTM
model = tf.keras.models.Sequential([
tf.keras.layers.Dense(6, input_shape=(3,2)),
tf.keras.layers.LSTM(12, return_sequences = False),
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(3, activation='sigmoid')
])
model.compile(optimizer=tf.keras.optimizers.Adam(),loss='binary_crossentropy')
model.fit(X,y, epochs=10)
This yields the error (*ValueError: logits and labels must have the same shape ((None, 3) vs (None, 3, 3))). It works if I set the output layer activation to softmax, but that yields a probability across all 3 labels that sum to 1, instead of a probability for each label.
Upvotes: 0
Views: 943
Reputation: 3763
I think you want to change your model to this
#create LSTM
model = tf.keras.models.Sequential([
tf.keras.layers.Dense(6, input_shape=(3,2)),
tf.keras.layers.LSTM(12, return_sequences=True),
tf.keras.layers.Dense(3, activation='sigmoid')
])
and if we look at the model summary
model.summary()
We get the following output
Layer (type) Output Shape Param #
=================================================================
dense_16 (Dense) (None, 3, 6) 18
_________________________________________________________________
lstm_8 (LSTM) (None, 3, 12) 912
_________________________________________________________________
dense_17 (Dense) (None, 3, 3) 39
=================================================================
Total params: 969
Trainable params: 969
Non-trainable params: 0
We can see that it really looks like the data is flowing in the correct shape
Let's take the above model and train it
model.compile(optimizer=tf.keras.optimizers.Adam(),loss='binary_crossentropy')
dataSet = tf.data.Dataset.from_tensors((X,y)).repeat(1000)
model.fit(dataSet, epochs=5)
Returns the following training results
Epoch 1/5
1000/1000 [==============================] - 2s 2ms/step - loss: 0.2522
Epoch 2/5
1000/1000 [==============================] - 2s 2ms/step - loss: 0.0142
Epoch 3/5
1000/1000 [==============================] - 2s 2ms/step - loss: 0.0048
Epoch 4/5
1000/1000 [==============================] - 2s 2ms/step - loss: 0.0023
Epoch 5/5
1000/1000 [==============================] - 2s 2ms/step - loss: 0.0012
Looks like it trains up Fantastic!
I think the main issue was the return_sequence=False
, which basically got rid of your second dimension of data. The Flatten
was unnecessary, and if you fixed the return_sequence the Flatten then caused another issue.
Let me know if this solves your problem
Upvotes: 1