Laure D
Laure D

Reputation: 887

Error when checking model input keras when predicting new results

I am trying to use a keras model I built on new data, except I have an input error when trying to predict the predictions.

Here's my code for the model:

def build_model(max_features, maxlen):
    """Build LSTM model"""
    model = Sequential()
    model.add(Embedding(max_features, 128, input_length=maxlen))
    model.add(LSTM(128))
    model.add(Dropout(0.5))
    model.add(Dense(1))
    model.add(Activation('sigmoid'))

    model.compile(loss='binary_crossentropy',
                  optimizer='rmsprop')

    return model

And my code to predict the output predictions of my new data:

LSTM_model = load_model('LSTMmodel.h5')
data = pickle.load(open('traindata.pkl', 'rb'))


#### LSTM ####

"""Run train/test on logistic regression model"""

# Extract data and labels
X = [x[1] for x in data]
labels = [x[0] for x in data]

# Generate a dictionary of valid characters
valid_chars = {x:idx+1 for idx, x in enumerate(set(''.join(X)))}

max_features = len(valid_chars) + 1
maxlen = np.max([len(x) for x in X])

# Convert characters to int and pad
X = [[valid_chars[y] for y in x] for x in X]
X = sequence.pad_sequences(X, maxlen=maxlen)

# Convert labels to 0-1
y = [0 if x == 'benign' else 1 for x in labels]


y_pred = LSTM_model.predict(X)

The error I get when running this code:

ValueError: Error when checking input: expected embedding_1_input to have shape (57,) but got array with shape (36,)

My error comes from maxlen because for my training data, maxlen=57 and with my new data, maxlen=36.

So I tried to set in my prediction code maxlen=57 but then I get this error:

tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[31,53] = 38 is not in [0, 38)
     [[Node: embedding_1/embedding_lookup = GatherV2[Taxis=DT_INT32, Tindices=DT_INT32, Tparams=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](embedding_1/embeddings/read, embedding_1/Cast, embedding_1/embedding_lookup/axis)]]

What should I do in order to resolve these issues? Change my embedding layer?

Upvotes: 1

Views: 784

Answers (1)

today
today

Reputation: 33410

Either set the input_length of the Embedding layer to the maximum length you would see in the dataset, or just use the same maxlen value you used when constructing the model in pad_sequences. In that case any sequence shorter than maxlen would be padded and any sequence longer than maxlen would be truncated.

Further make sure that the features you use are the same in both train and test time (i.e. their numbers should not change).

Upvotes: 1

Related Questions