TensorFlow Serving: Sending dictionary of multiple inputs to TFServing Model using the REST Api?

Question

I'm serving a BERT model using TFServing and want to extract the hidden layers using the REST API. When using the model in Google Colab I can run inference just fine using:

  inputs = {
    "input_ids": input_ids,
    "attention_mask": input_mask,
    "token_type_ids": input_type_ids
}
test_output = model(bert_inputs)

I then save the model like this:

tf.saved_model.save(model, model_save_path)

Looking at the saved model using the saved_model_cli it looks like this.

MetaGraphDef with tag-set: 'serve' contains the following SignatureDefs:

signature_def['__saved_model_init_op']:
  The given SavedModel SignatureDef contains the following input(s):
  The given SavedModel SignatureDef contains the following output(s):
    outputs['__saved_model_init_op'] tensor_info:
        dtype: DT_INVALID
        shape: unknown_rank
        name: NoOp
  Method name is:

signature_def['serving_default']:
  The given SavedModel SignatureDef contains the following input(s):
    inputs['input_ids'] tensor_info:
        dtype: DT_INT32
        shape: (-1, 5)
        name: serving_default_input_ids:0
  The given SavedModel SignatureDef contains the following output(s):
    outputs['hidden_states_1'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 5, 768)
        name: StatefulPartitionedCall:0
    outputs['hidden_states_10'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 5, 768)
        name: StatefulPartitionedCall:1
    outputs['hidden_states_11'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 5, 768)
        name: StatefulPartitionedCall:2
    outputs['hidden_states_12'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 5, 768)
        name: StatefulPartitionedCall:3
    outputs['hidden_states_13'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 5, 768)
        name: StatefulPartitionedCall:4
    outputs['hidden_states_2'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 5, 768)
        name: StatefulPartitionedCall:5
    outputs['hidden_states_3'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 5, 768)
        name: StatefulPartitionedCall:6
    outputs['hidden_states_4'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 5, 768)
        name: StatefulPartitionedCall:7
    outputs['hidden_states_5'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 5, 768)
        name: StatefulPartitionedCall:8
    outputs['hidden_states_6'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 5, 768)
        name: StatefulPartitionedCall:9
    outputs['hidden_states_7'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 5, 768)
        name: StatefulPartitionedCall:10
    outputs['hidden_states_8'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 5, 768)
        name: StatefulPartitionedCall:11
    outputs['hidden_states_9'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 5, 768)
        name: StatefulPartitionedCall:12
    outputs['last_hidden_state'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 5, 768)
        name: StatefulPartitionedCall:13
    outputs['pooler_output'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 768)
        name: StatefulPartitionedCall:14
  Method name is: tensorflow/serving/predict

Defined Functions:
  Function Name: '__call__'
    Option #1
      Callable with:
        Argument #1
          DType: dict
          Value: {'input_ids': TensorSpec(shape=(None, 5), dtype=tf.int32, name='input_ids/input_ids')}
        Argument #2
          DType: NoneType
          Value: None
        Argument #3
          DType: NoneType
          Value: None
        Argument #4
          DType: NoneType
          Value: None
        Argument #5
          DType: NoneType
          Value: None
        Argument #6
          DType: NoneType
          Value: None
        Argument #7
          DType: NoneType
          Value: None
        Argument #8
          DType: NoneType
          Value: None
        Argument #9
          DType: NoneType
          Value: None
        Argument #10
          DType: bool
          Value: True
    Option #2
      Callable with:
        Argument #1
          DType: dict
          Value: {'input_ids': TensorSpec(shape=(None, 5), dtype=tf.int32, name='input_ids/input_ids')}
        Argument #2
          DType: NoneType
          Value: None
        Argument #3
          DType: NoneType
          Value: None
        Argument #4
          DType: NoneType
          Value: None
        Argument #5
          DType: NoneType
          Value: None
        Argument #6
          DType: NoneType
          Value: None
        Argument #7
          DType: NoneType
          Value: None
        Argument #8
          DType: NoneType
          Value: None
        Argument #9
          DType: NoneType
          Value: None
        Argument #10
          DType: bool
          Value: False

  Function Name: '_default_save_signature'
    Option #1
      Callable with:
        Argument #1
          DType: dict
          Value: {'input_ids': TensorSpec(shape=(None, 5), dtype=tf.int32, name='input_ids')}

  Function Name: 'call_and_return_all_conditional_losses'
    Option #1
      Callable with:
        Argument #1
          DType: dict
          Value: {'input_ids': TensorSpec(shape=(None, 5), dtype=tf.int32, name='input_ids/input_ids')}
        Argument #2
          DType: NoneType
          Value: None
        Argument #3
          DType: NoneType
          Value: None
        Argument #4
          DType: NoneType
          Value: None
        Argument #5
          DType: NoneType
          Value: None
        Argument #6
          DType: NoneType
          Value: None
        Argument #7
          DType: NoneType
          Value: None
        Argument #8
          DType: NoneType
          Value: None
        Argument #9
          DType: NoneType
          Value: None
        Argument #10
          DType: bool
          Value: True
    Option #2
      Callable with:
        Argument #1
          DType: dict
          Value: {'input_ids': TensorSpec(shape=(None, 5), dtype=tf.int32, name='input_ids/input_ids')}
        Argument #2
          DType: NoneType
          Value: None
        Argument #3
          DType: NoneType
          Value: None
        Argument #4
          DType: NoneType
          Value: None
        Argument #5
          DType: NoneType
          Value: None
        Argument #6
          DType: NoneType
          Value: None
        Argument #7
          DType: NoneType
          Value: None
        Argument #8
          DType: NoneType
          Value: None
        Argument #9
          DType: NoneType
          Value: None
        Argument #10
          DType: bool
          Value: False

  Function Name: 'serving'
    Option #1
      Callable with:
        Argument #1
          DType: dict
          Value: {'input_ids': TensorSpec(shape=(None, None), dtype=tf.int32, name='input_ids'), 'attention_mask': TensorSpec(shape=(None, None), dtype=tf.int32, name='attention_mask'), 'token_type_ids': TensorSpec(shape=(None, None), dtype=tf.int32, name='token_type_ids')}

For the API call, I am constructing the input in the request to match the models expectation to how I would be doing it during normal inference (as per the TFServing documentation https://www.tensorflow.org/tfx/serving/api_rest):

inference_url = "http://localhost:8501/v1/models/:predict"
data = {
        "instances": [{
            "input_ids": input_ids.numpy().tolist(),
            "attention_mask": attention_mask.numpy().tolist(),
            "token_type_ids": token_type_id.numpy().tolist()
            }]
}
headers = {"content-type": "application/json"}
response = requests.post(inference_url, headers = headers, data = json.dumps(data))

The problem I'm facing is that when calling the API endpoint:

/v1/models/:predict

It seems as if the model is not expecting the parameters "attention_mask" and "token_type_ids".

Even though the "Function Name: 'serving'" part of the model looks like it should be expecting both "input_ids", "attention_mask" and "token_type_ids". I still get the below error from the REST API:

{
    "error": "Failed to process element: 0 key: attention_mask of 'instances' list. Error: Invalid argument: JSON object: does not have named input: attention_mask"
}

To me it looks like it may have something to do with SignatureDef. It seems like the saved model is actually only expecting "input_ids", even though the actual model I am saving in Google Colab indeed expects a dictionary with all three "input_ids", "attention_mask" and "token_type_ids".

Have I saved the model wrong somehow? Can someone give me a hint towards what I am doing wrong?

Many thanks in advance!

TensorFlow Serving: Sending dictionary of multiple inputs to TFServing Model using the REST Api?

Answers (1)

Related Questions