Reputation: 45
I have a saved model I trained on a small text (messaging) data corpus, and I'm trying to use that same model to predict either positive or negative sentiment (i.e. binary classification) on another corpus. I based the NLP model on a GOOGLE dev ML guide, which you can review here (if you think it useful - I used option A for all).
I keep getting an input shape error, I know that the error means I have to reshape the input to fit the expected shape. However, the data I want to predict on is not of this size. The error statement is:
ValueError: Error when checking input: expected dropout_8_input to have shape (519,) but got array with shape (184,)
The reason why the model expects the shape (519,) is because during training the corpus fed into the first dropout layer (in TfidfVectorized form) is print(x_train.shape) #(454, 519)
.
I'm new to ML, but it doesn't make sens to me that all the data I try to predict on after optimizing a model should be the same shape as the data that was used to train the model. Has anyone experienced an issue similar to this? Is there something that I'm missing, in how to train the model so that a different sized input can be predicted on? Or, am I misunderstanding on how models are to be used for class prediction?
I am basing myself on the following functions for model training:
from tensorflow.python.keras import models
from tensorflow.python.keras.layers import Dense, Dropout, Activation, Flatten
from tensorflow.python.keras.layers import Convolution2D, MaxPooling2D
def mlp_model(layers, units, dropout_rate, input_shape, num_classes):
"""Creates an instance of a multi-layer perceptron model.
# Arguments
layers: int, number of `Dense` layers in the model.
units: int, output dimension of the layers.
dropout_rate: float, percentage of input to drop at Dropout layers.
input_shape: tuple, shape of input to the model.
num_classes: int, number of output classes.
# Returns
An MLP model instance.
"""
op_units, op_activation = _get_last_layer_units_and_activation(num_classes)
model = models.Sequential()
model.add(Dropout(rate=dropout_rate, input_shape=input_shape))
# print(input_shape)
for _ in range(layers-1):
model.add(Dense(units=units, activation='relu'))
model.add(Dropout(rate=dropout_rate))
model.add(Dense(units=op_units, activation=op_activation))
return mode
def train_ngram_model(data,
learning_rate=1e-3,
epochs=1000,
batch_size=128,
layers=2,
units=64,
dropout_rate=0.2):
"""Trains n-gram model on the given dataset.
# Arguments
data: tuples of training and test texts and labels.
learning_rate: float, learning rate for training model.
epochs: int, number of epochs.
batch_size: int, number of samples per batch.
layers: int, number of `Dense` layers in the model.
units: int, output dimension of Dense layers in the model.
dropout_rate: float: percentage of input to drop at Dropout layers.
# Raises
ValueError: If validation data has label values which were not seen
in the training data.
# Reference
For tuning hyperparameters, please visit the following page for
further explanation of each argument:
https://developers.google.com/machine-learning/guides/text-classification/step-5
"""
# Get the data.
(train_texts, train_labels), (val_texts, val_labels) = data
# Verify that validation labels are in the same range as training labels.
num_classes = get_num_classes(train_labels)
unexpected_labels = [v for v in val_labels if v not in range(num_classes)]
if len(unexpected_labels):
raise ValueError('Unexpected label values found in the validation set:'
' {unexpected_labels}. Please make sure that the '
'labels in the validation set are in the same range '
'as training labels.'.format(
unexpected_labels=unexpected_labels))
# Vectorize texts.
x_train, x_val = ngram_vectorize(
train_texts, train_labels, val_texts)
# Create model instance.
model = mlp_model(layers=layers,
units=units,
dropout_rate=dropout_rate,
input_shape=x_train.shape[1:],
num_classes=num_classes)
# num_classes determine which activation fn to use
# Compile model with learning parameters.
if num_classes == 2:
loss = 'binary_crossentropy'
else:
loss = 'sparse_categorical_crossentropy'
optimizer = tf.keras.optimizers.Adam(lr=learning_rate)
model.compile(optimizer=optimizer, loss=loss, metrics=['acc'])
# Create callback for early stopping on validation loss. If the loss does
# not decrease in two consecutive tries, stop training.
callbacks = [tf.keras.callbacks.EarlyStopping(
monitor='val_loss', patience=2)]
# Train and validate model.
history = model.fit(
x_train,
train_labels,
epochs=epochs,
callbacks=callbacks,
validation_data=(x_val, val_labels),
verbose=2, # Logs once per epoch.
batch_size=batch_size)
# Print results.
history = history.history
print('Validation accuracy: {acc}, loss: {loss}'.format(
acc=history['val_acc'][-1], loss=history['val_loss'][-1]))
# Save model.
model.save('MCTR2.h5')
return history['val_acc'][-1], history['val_loss'][-1]
From this I get the architecture of the model to be:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dropout (Dropout) (None, 519) 0
_________________________________________________________________
dense (Dense) (None, 64) 33280
_________________________________________________________________
dropout_1 (Dropout) (None, 64) 0
_________________________________________________________________
dense_1 (Dense) (None, 1) 65
=================================================================
Total params: 33,345
Trainable params: 33,345
Non-trainable params: 0
_________________________________________________________________
Upvotes: 0
Views: 1198
Reputation: 301
For dimensions to be variable in tensorflow
, they need to be specified as None
.
The first dimension is the batch_size
, which is why that's generally always None
, but typically a batch of sequence data will have the shape (batch_size, sequence_length, num_features)
. So a single sequence is usually 2D, with the length being variable, but the number of features of each "token" being fixed.
It appears you are feeding your model 1D vectors, and Dense
layers have fixed input shape. If you want to model variable length sequences, you have to build your model using layers which accommodate that (e.g., convolution, LSTM).
Upvotes: 1