Reputation: 1
I'm serving a BERT model using TFServing and want to extract the hidden layers using the REST API. When using the model in Google Colab I can run inference just fine using:
inputs = {
"input_ids": input_ids,
"attention_mask": input_mask,
"token_type_ids": input_type_ids
}
test_output = model(bert_inputs)
I then save the model like this:
tf.saved_model.save(model, model_save_path)
Looking at the saved model using the saved_model_cli it looks like this.
MetaGraphDef with tag-set: 'serve' contains the following SignatureDefs:
signature_def['__saved_model_init_op']:
The given SavedModel SignatureDef contains the following input(s):
The given SavedModel SignatureDef contains the following output(s):
outputs['__saved_model_init_op'] tensor_info:
dtype: DT_INVALID
shape: unknown_rank
name: NoOp
Method name is:
signature_def['serving_default']:
The given SavedModel SignatureDef contains the following input(s):
inputs['input_ids'] tensor_info:
dtype: DT_INT32
shape: (-1, 5)
name: serving_default_input_ids:0
The given SavedModel SignatureDef contains the following output(s):
outputs['hidden_states_1'] tensor_info:
dtype: DT_FLOAT
shape: (-1, 5, 768)
name: StatefulPartitionedCall:0
outputs['hidden_states_10'] tensor_info:
dtype: DT_FLOAT
shape: (-1, 5, 768)
name: StatefulPartitionedCall:1
outputs['hidden_states_11'] tensor_info:
dtype: DT_FLOAT
shape: (-1, 5, 768)
name: StatefulPartitionedCall:2
outputs['hidden_states_12'] tensor_info:
dtype: DT_FLOAT
shape: (-1, 5, 768)
name: StatefulPartitionedCall:3
outputs['hidden_states_13'] tensor_info:
dtype: DT_FLOAT
shape: (-1, 5, 768)
name: StatefulPartitionedCall:4
outputs['hidden_states_2'] tensor_info:
dtype: DT_FLOAT
shape: (-1, 5, 768)
name: StatefulPartitionedCall:5
outputs['hidden_states_3'] tensor_info:
dtype: DT_FLOAT
shape: (-1, 5, 768)
name: StatefulPartitionedCall:6
outputs['hidden_states_4'] tensor_info:
dtype: DT_FLOAT
shape: (-1, 5, 768)
name: StatefulPartitionedCall:7
outputs['hidden_states_5'] tensor_info:
dtype: DT_FLOAT
shape: (-1, 5, 768)
name: StatefulPartitionedCall:8
outputs['hidden_states_6'] tensor_info:
dtype: DT_FLOAT
shape: (-1, 5, 768)
name: StatefulPartitionedCall:9
outputs['hidden_states_7'] tensor_info:
dtype: DT_FLOAT
shape: (-1, 5, 768)
name: StatefulPartitionedCall:10
outputs['hidden_states_8'] tensor_info:
dtype: DT_FLOAT
shape: (-1, 5, 768)
name: StatefulPartitionedCall:11
outputs['hidden_states_9'] tensor_info:
dtype: DT_FLOAT
shape: (-1, 5, 768)
name: StatefulPartitionedCall:12
outputs['last_hidden_state'] tensor_info:
dtype: DT_FLOAT
shape: (-1, 5, 768)
name: StatefulPartitionedCall:13
outputs['pooler_output'] tensor_info:
dtype: DT_FLOAT
shape: (-1, 768)
name: StatefulPartitionedCall:14
Method name is: tensorflow/serving/predict
Defined Functions:
Function Name: '__call__'
Option #1
Callable with:
Argument #1
DType: dict
Value: {'input_ids': TensorSpec(shape=(None, 5), dtype=tf.int32, name='input_ids/input_ids')}
Argument #2
DType: NoneType
Value: None
Argument #3
DType: NoneType
Value: None
Argument #4
DType: NoneType
Value: None
Argument #5
DType: NoneType
Value: None
Argument #6
DType: NoneType
Value: None
Argument #7
DType: NoneType
Value: None
Argument #8
DType: NoneType
Value: None
Argument #9
DType: NoneType
Value: None
Argument #10
DType: bool
Value: True
Option #2
Callable with:
Argument #1
DType: dict
Value: {'input_ids': TensorSpec(shape=(None, 5), dtype=tf.int32, name='input_ids/input_ids')}
Argument #2
DType: NoneType
Value: None
Argument #3
DType: NoneType
Value: None
Argument #4
DType: NoneType
Value: None
Argument #5
DType: NoneType
Value: None
Argument #6
DType: NoneType
Value: None
Argument #7
DType: NoneType
Value: None
Argument #8
DType: NoneType
Value: None
Argument #9
DType: NoneType
Value: None
Argument #10
DType: bool
Value: False
Function Name: '_default_save_signature'
Option #1
Callable with:
Argument #1
DType: dict
Value: {'input_ids': TensorSpec(shape=(None, 5), dtype=tf.int32, name='input_ids')}
Function Name: 'call_and_return_all_conditional_losses'
Option #1
Callable with:
Argument #1
DType: dict
Value: {'input_ids': TensorSpec(shape=(None, 5), dtype=tf.int32, name='input_ids/input_ids')}
Argument #2
DType: NoneType
Value: None
Argument #3
DType: NoneType
Value: None
Argument #4
DType: NoneType
Value: None
Argument #5
DType: NoneType
Value: None
Argument #6
DType: NoneType
Value: None
Argument #7
DType: NoneType
Value: None
Argument #8
DType: NoneType
Value: None
Argument #9
DType: NoneType
Value: None
Argument #10
DType: bool
Value: True
Option #2
Callable with:
Argument #1
DType: dict
Value: {'input_ids': TensorSpec(shape=(None, 5), dtype=tf.int32, name='input_ids/input_ids')}
Argument #2
DType: NoneType
Value: None
Argument #3
DType: NoneType
Value: None
Argument #4
DType: NoneType
Value: None
Argument #5
DType: NoneType
Value: None
Argument #6
DType: NoneType
Value: None
Argument #7
DType: NoneType
Value: None
Argument #8
DType: NoneType
Value: None
Argument #9
DType: NoneType
Value: None
Argument #10
DType: bool
Value: False
Function Name: 'serving'
Option #1
Callable with:
Argument #1
DType: dict
Value: {'input_ids': TensorSpec(shape=(None, None), dtype=tf.int32, name='input_ids'), 'attention_mask': TensorSpec(shape=(None, None), dtype=tf.int32, name='attention_mask'), 'token_type_ids': TensorSpec(shape=(None, None), dtype=tf.int32, name='token_type_ids')}
For the API call, I am constructing the input in the request to match the models expectation to how I would be doing it during normal inference (as per the TFServing documentation https://www.tensorflow.org/tfx/serving/api_rest):
inference_url = "http://localhost:8501/v1/models/<my_model_name>:predict"
data = {
"instances": [{
"input_ids": input_ids.numpy().tolist(),
"attention_mask": attention_mask.numpy().tolist(),
"token_type_ids": token_type_id.numpy().tolist()
}]
}
headers = {"content-type": "application/json"}
response = requests.post(inference_url, headers = headers, data = json.dumps(data))
The problem I'm facing is that when calling the API endpoint:
/v1/models/<my_model_name>:predict
It seems as if the model is not expecting the parameters "attention_mask" and "token_type_ids".
Even though the "Function Name: 'serving'" part of the model looks like it should be expecting both "input_ids", "attention_mask" and "token_type_ids". I still get the below error from the REST API:
{
"error": "Failed to process element: 0 key: attention_mask of 'instances' list. Error: Invalid argument: JSON object: does not have named input: attention_mask"
}
To me it looks like it may have something to do with SignatureDef. It seems like the saved model is actually only expecting "input_ids", even though the actual model I am saving in Google Colab indeed expects a dictionary with all three "input_ids", "attention_mask" and "token_type_ids".
Have I saved the model wrong somehow? Can someone give me a hint towards what I am doing wrong?
Many thanks in advance!
Upvotes: 0
Views: 1647
Reputation:
please check link and format the input data accordingly: Debugging TensorFlow serving on BERT model
Try with the below snippet for the API call :
import json
import requests
inference_url = "http://localhost:8501/v1/models/<my_model_name>:predict"
data = json.dumps({"signature_name": "serving_default", "instances": [{'input_ids':[input_ids.numpy()], 'attention_mask':[attention_mask.numpy()], 'token_type_ids':[token_type_id.numpy()]}]})
headers = {"content-type": "application/json"}
response = requests.post(inference_url, data=data, headers=headers)
Upvotes: 0