Keras / Tensorflow: Predict Using tf.data.Dataset API

I'm using Keras with a Tensorflow backend for building a model for this problem: https://www.kaggle.com/cfpb/us-consumer-finance-complaints (just practicing).

I train my Keras model using the tf.data.Dataset API. Now, I have a Pandas DataFrame, df_testing, whose columns are complaint (strings) and label (also strings). I want to predict on these new samples. I create a tf.data.Dataset object, perform preprocessing, make an Iterator, and call predict on my model:

data = df_testing["complaint"].values
labels = df_testing["label"].values

dataset = tf.data.Dataset.from_tensor_slices((data))
dataset = dataset.map(lambda x: ({'reviews': x}))
dataset = dataset.batch(self.batch_size).repeat()
dataset = dataset.map(lambda x: self.preprocess_text(x, self.data_table))
dataset = dataset.map(lambda x: x['reviews'])
dataset = dataset.make_initializable_iterator()

My training used a tf.data.Dataset where each element was of the form ({'reviews': "movie was great"}, "positive") so I'm mimicking that here for prediction. Also, my preprocessing just turns my string into a Tensor of integers.

When I call:

preds = model.predict(dataset)

But I'm told my predict call fails:

ValueError: When using iterators as input to a model, you should specify the `steps` argument.

So I modify this call to be:

preds = model.predict(dataset, steps=3)

But now I get back:

ValueError: Please provide data as a list or tuple of 2 elements  - input and target pair. Received Tensor("IteratorGetNext_2:0", shape=(?, 100), dtype=int32)

What am I doing incorrectly here? I shouldn't have to provide a tuple of 2 elements when predicting (I shouldn't need the label).

Thanks for any help you can offer!

Upvotes: 9

Answers (2)

ot226

Reputation: 328

The following code worked for me (tested on tensorflow 1.10.0):

[TLDR] Only insert empty dictionary as a dummy input and specify the number of steps:

model.predict(x={},steps=4)

Full code:

import numpy as np
import tensorflow as tf
from tensorflow.data import Dataset
from tensorflow.keras.layers import Dense, Input
from tensorflow.keras.models import Model


# dummy data:
x = np.arange(4).reshape(-1, 1).astype('float32')
y = np.arange(5, 9).reshape(-1, 1).astype('float32')

# build the Datasets
ds_x = Dataset.from_tensor_slices(x).repeat().batch(4)
it_x = ds_x.make_one_shot_iterator()

ds_y = Dataset.from_tensor_slices(y).repeat().batch(4)
it_y = ds_y.make_one_shot_iterator()


# build compile and train the model
input_vals = Input(tensor=it_x.get_next())
output = Dense(1, activation='relu')(input_vals)
model = Model(inputs=input_vals, outputs=output)
model.compile('rmsprop', 'mse', target_tensors=[it_y.get_next()])
model.fit(steps_per_epoch=1, epochs=5, verbose=2)

# infer using the dataset
model.predict(x={},steps=4)

Upvotes: 2

lmartens

Reputation: 1512

What version of Keras are you on? I cannot find that specific error message in the code base, but I think I found where it used to be.

Here's the error in a version of the code that I think is close to the version you're running: commit

And here's the updated version of that error: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/keras/engine/training_eager.py#L464

The conditions of the input validation have changed (in the newest version your input would be accepted), but what's relevant is that the error message is much more clear:

raise ValueError(
    'Please provide data as a list or tuple of 1, 2, or 3 elements '
    ' - `(input)`, or `(input, target)`, or `(input, target,'
    'sample_weights)`. Received %s. We do not use the `target` or'
    '`sample_weights` value here.' % inputs.output_shapes)

The target value is never used in the predict function, and so can be anything. Looking at the rest of the function next_element[1] is never used.

[TLDR] Using your current version, add a dummy target value to the data, or update your Keras.

Upvotes: 3

Keras / Tensorflow: Predict Using tf.data.Dataset API

Answers (2)

Related Questions