Reputation: 107
I have the following code that creates LSTM network using Keras with TensorFlow backend. This code runs well.
import numpy as np
import pandas as pd
from sklearn import model_selection
from keras.models import Sequential
from keras.layers import Dense, Activation, Dropout
from keras.layers.recurrent import LSTM
from keras.utils import np_utils
flights = {
'flight_stage': [1,0,1,1,0,0,1],
'scheduled_hour': [16,16,17,17,17,18,18],
'delay_category': [1,0,2,2,1,0,2]
}
columns = ['flight_stage', 'scheduled_hour', 'delay_category']
df = pd.DataFrame(flights, columns=columns)
X = df.drop('delay_category',1)
y = df['delay_category']
X_train, X_test, y_train, y_test = model_selection.train_test_split(X, y, test_size=0.25, random_state=42)
nb_features = X_train.shape[1]
nb_classes = y.nunique()
hidden_neurons = 32
timestamps = X_train.shape[0]
# Reshape input data to 3D array
X_train = X_train.values.reshape(1, X_train.shape[0], X_train.shape[1])
X_test = X_test.values.reshape(1, X_test.shape[0], X_test.shape[1])
y_train = np_utils.to_categorical(y_train, nb_classes)
y_test = np_utils.to_categorical(y_test, nb_classes)
model = Sequential()
model.add(LSTM(
units=hidden_neurons,
return_sequences=True,
input_shape=(timestamps,nb_features)
)
)
model.add(Dropout(0.2))
model.add(Dense(activation='softmax', units=nb_classes))
model.compile(loss="categorical_crossentropy",
optimizer='adadelta')
But when I start training the model, it fails:
history = model.fit(X_train, y_train, validation_split=0.25, epochs=500, batch_size=2, shuffle=True, verbose=0)
Error:
ValueError: Error when checking target: expected dense_19 to have 3 dimensions, but got array with shape (5, 3)
This error refers to the final Dense layer. I used model.summary()
to get exact dimensions. The output shape of a Dense layer is (None, 5, 3)
.
However I do not understand why does it have 3 dimensions and what None
stands for (how did it appear in this last layer)?
Upvotes: 1
Views: 228
Reputation: 18371
3 is the number of units returned by the last layer. It is the number of classes for the softmax activation
5 is the number of units returned by the lstm which indicates the size of the sequences returned
None is the number of element by batch for the last layer. It simply means that the last layer can accept different size for each batches of tensor of shape [5, 3]
X_train shape: (1, 5, 2),
X_test shape: (1, 2, 2),
y_train shape: (5,3),
y_test shape: (2,3)
Looking at the data shape, there is clearly a mismatch between the batchsize of the features and that of the labels. The most left number should be equal between the features shape X and the labels shape y. It is the batchsize.
'1', 5, 2 => batch size of 1
'2', 3 => batch size of 2
There is a mismatch here.
Also to solve the issue between the output of the lstm layer and the input of the last layer, one can use a layer.flatten
nb_classes = 3
hidden_neurons = 32
model = Sequential()
model.add(LSTM(
units=hidden_neurons,
return_sequences=True,
input_shape=(5, 2)
)
)
model.add(Dropout(0.2))
model.add(Flatten())
model.add(Dense(activation='softmax', units=nb_classes))
model.compile(loss="categorical_crossentropy",
optimizer='adadelta')
model.compile(loss='categorical_crossentropy',
optimizer='adam')
Upvotes: 1