user3684119
user3684119

Reputation: 63

Invoke endpoint after model deployment : [Err 104] Connection reset by peer

I am new to Sagemaker. I have deployed my well trained model in tensorflow by using Json and Weight file. But it is strange that in my note book, I didn't see it says "Endpoint successfully built". Only the below is shown:

--------------------------------------------------------------------------------!

Instead, I found the endpoint number from my console.

import sagemaker
from sagemaker.tensorflow.model import TensorFlowModel
        predictor=sagemaker.tensorflow.model.TensorFlowPredictor(endpoint_name, sagemaker_session)
data= test_out2
predictor.predict(data)

Then I try to invoke the endpoint by using 2D array: (1) If my 2D array is in size of (5000, 170), I am getting the error:

ConnectionResetError: [Errno 104] Connection reset by peer

(2) If reducing the array to size of (10,170), error is :

ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received server error (500) from model with message "". See https://us-east-2.console.aws.amazon.com/cloudwatch/home?region=us-east-2#logEventViewer:group=/aws/sagemaker/Endpoints/sagemaker-tensorflow-2019-04-28-XXXXXXXXX in account 15XXXXXXXX for more information.

Any suggestion please? Found similar case in github, https://github.com/awslabs/amazon-sagemaker-examples/issues/589.

Is it the similar case please?

Thank you very much in advance!

Upvotes: 0

Views: 897

Answers (2)

EnterpriseMike
EnterpriseMike

Reputation: 195

I had this problem and this post helped me resolve it. There does seem to be a limit to the size of the dataset that the predictor will take. I'm not sure what it is, but in any case I now split my training/test data differently.

I assume there's a limit and the limit is based on raw data volume. In rough terms this would translate to the number of cells in my dataframe, since each cell is probably and integer or a float.

If I can get a 70%/30% split I use that, but if 30% test data exceeds the maximum number of cells, I split my data to give me the maximum number of rows that will fit into the maximum.

Here's the split code:

# Check that the test data isn't too big for the predictor
max_test_cells = 200000
model_rows, model_cols = model_data.shape
print('model_data.shape=', model_data.shape)
max_test_rows = int(max_test_cells / model_cols)
print('max_test_rows=', max_test_rows)
test_rows = min(int(0.3 * len(model_data)), max_test_rows)
print('actual_test_rows=', test_rows)
training_rows = model_rows - test_rows
print('training_rows=', training_rows)

# Split the data to get the largest test set possible
train_data, test_data = np.split(model_data.sample(frac=1, random_state=1729), [training_rows])
print(train_data.shape, test_data.shape)

Upvotes: 0

SphericalCow
SphericalCow

Reputation: 176

The first error with data size (5000, 170) might be due to a capacity issue. SageMaker endpoint prediction has a size limit of 5mb. So if your data is larger than 5mb, you need to chop it into pieces and call predict multiple times.

For the second error with data size (10, 170), the error message asks you to look into logs. Did you find anything interesting in the cloudwatch log? Anything can be shared in this question?

Upvotes: 1

Related Questions