vortex
vortex

Reputation: 99

Tensorflow serving ml engine online prediction json file format

I would like to save a Tensorflow model to ml-engine on GCP, and to make an online prediction.
I have successfully created the model on the ml-engine, however, I am struggling to make the input JSON string feed into the model. Here is the code and data, credit goes to Jose Portilla from his Tensorflow course on Udemy.

I have used gcloud commend for prediction:

gcloud ml-engine predict --model='lstm_test' --version 'v3' --json-instances ./test.json

test.json content:

{"inputs":[1,2,3,4,5,6,7,8,9,10,11,12]}

Errors that I got:

{ "error": "Prediction failed: Error during model execution: AbortionError(code=StatusCode.INVALID_ARGUMENT, details=\"You must feed a value for placeholder tensor 'Placeholder_2' with dtype float and shape [?,12,1]\n\t [[Node: Placeholder_2 = Placeholder_output_shapes=[[?,12,1]], dtype=DT_FLOAT, shape=[?,12,1], _device=\"/job:localhost/replica:0/task:0/device:CPU:0\"]]\")" }

Upvotes: 0

Views: 933

Answers (1)

rhaertel80
rhaertel80

Reputation: 8389

Generally speaking, using an example proto as input is not the preferred method for using the CloudML service. Instead, we'll directly use a placeholder.

Also, generally speaking, you should create a clean serving graph, so I would also suggest the following change:

def build_graph(x):
  # All the code shared between training and prediction, given input x
  ...

  outputs = ...

  # Make sure they both have a Saver.    
  saver = tf.train.Saver()

  return outputs, saver

# Do training
with tf.Graph().as_default() as prediction_graph:
  x = tf.placeholder(tf.float32, [None, num_time_steps, num_inputs])
  outputs, saver = build_graph(x)

with tf.Session(graph=prediction_graph) as sess:
  session.run([tf.local_variables_initializer(), tf.tables_initializer()])
  saver.restore(session, latest)

# This is a much simpler interface for saving models.
tf.saved_model.simple_save(
    sess,
    export_dir=SaveModel_folder,
    inputs={"x": x},
    outputs={"y": outputs}
)

Now, the file you use with gcloud should look something like this:

[[1, 1, 1, 1], [1, 1, 1, 1], [1, 1, 1, 1]]
[[2, 2, 2, 2], [2, 2, 2, 2], [2, 2, 2, 2]]

This sends a batch of two instances (one instance/example per line), and assumes num_inputs is 4 and num_time_steps is 3.

One more important caveat, gcloud's file format is slightly different than the full body of the request you would send if you were using a traditional client to send a request (e.g. JS, Python, curl, etc.). The body of the request corresponding to the same file above is:

{
  "instances": [
    [[1, 1, 1, 1], [1, 1, 1, 1], [1, 1, 1, 1]]
    [[2, 2, 2, 2], [2, 2, 2, 2], [2, 2, 2, 2]]
  ]
}

Basically, each line in the gcloud file becomes an entry in the "instances" array.

Upvotes: 3

Related Questions