Reputation: 21
I'm encountering a Maximum Recursion Depth Exceeded Error when I try calling .predict() on a deployed SageMaker Endpoint (for a TensorFlow model). I've put logging statements in my inference script, specifically in my input_handler function and the data reading/cleaning functions that the handler calls.
The logs suggest that everything is working until I call my first read_latest_file()
function on user_heart_rate_uri
. So the log "Called all user_data_uri functions"
is printed in CloudWatch but "read_latest_file on heart rate called"
is not printed. I've further put logging statements into the read_latest_file() and I think I know where the issue is occurring.
**Here is part of my input_handler function: **
def input_handler(data, context):
logger.info("input_handler entered")
if context.request_content_type == 'application/json': # should be context for testing
logger.info(f"Raw data type: {type(data)}")
d = data.read().decode('utf-8')
logger.info(f"Decoded data type: {type(d)}")
logger.info(f"d value: {d}")
data_object = json.loads(d)
data_object_dict_check = isinstance(data_object, dict)
data_object_string_check = isinstance(data_object, str)
if (not data_object_dict_check and data_object_string_check):
logger.info("String and not dictionary")
data_object = ast.literal_eval(data_object)
data_object_type_check = type(data_object)
# logger.info(f"data_object value : {data_object}")
logger.info(f"Deserialized data type check: {data_object_type_check}")
# logger.info(f"data_object's type is {type(data_object)} and keys are {data_object.keys()}")
try:
user_data = data_object["user_data"]
user_ids = data_object["user_ids"]
except:
logger.info(f"Except block, data_object value: {data_object}")
logger.info("Get runs on data_object")
logger.info(f"{user_data.keys()}")
heart_rate_dictionary = {} # {userid: {date1: val, date}}
steps_dictionary = {}
nonseq_dictionary = {}
for user_id in user_ids:
logger.info(f"Going through user: {user_id}")
user_data_uri = user_data[user_id] # this gives me a dictionary
user_heart_rate_uri = user_data_uri["heart_rate"]
user_step_count_uri = user_data_uri["step_count"]
user_sleep_uri = user_data_uri["sleep"]
user_demographic_uri = user_data_uri["demographic"]
logger.info("Called all user_data_uri functions")
deserialized_heart_data = read_latest_file(user_heart_rate_uri)
logger.info("read_latest_file on heart rate called")
deserialized_step_data = read_latest_file(user_step_count_uri)
logger.info("read_latest_file on step data called")
deserialized_sleep_data = read_latest_file(user_sleep_uri)
logger.info("read_latest_file on sleep data called")
deserialized_demographic_data = read_demographic(user_demographic_uri)
logger.info("read_demographic called")
logger.info("Called all read file functions")
Here is my read_latest_file() function:
def read_latest_file(folder_uri):
logger.info("read_latest_file entered")
s3_client = boto3.client("s3")
logger.info("s3_client initialized")
bucket_name = "nashs3bucket15927-dev"
response = s3_client.list_objects_v2(Bucket=bucket_name, Prefix=folder_uri)
logger.info("list_objects_v2 called")
latest_file = max(response.get('Contents', []), key=lambda x: x['LastModified']) if 'Contents' in response else None
logger.info("Latest file found")
if latest_file:
logger.info("latest file not empty")
# print("working")
file_key = latest_file['Key']
logger.info(f"file key: {file_key}")
# Read the JSON file content from S3
response = s3_client.get_object(Bucket=bucket_name, Key=file_key)
logger.info("get_object called, response received")
file_content = json.loads(response['Body'].read().decode('utf-8'))
logger.info("file decoded and deserialized, file_content received")
logger.info(f"length of file_content: {len(file_content)}")
return file_content
else:
logger.info("latest file empty")
return None
The log "read_latest_file entered"
is being printed. However, "s3_client initialized"
is not being printed. More CloudWatch logs suggest that calling boto3.client("s3") is giving me the error.
Relevant CloudWatch Logs about boto3.client("s3")
1596 2024-01-08 19:28:38,664 INFO read_latest_file entered
1596 2024-01-08 19:28:38,680 ERROR exception handling request: maximum recursion depth exceeded
Traceback (most recent call last):
File "/sagemaker/python_service.py", line 373, in _handle_invocation_post
res.body, res.content_type = handlers(data, context)
File "/sagemaker/python_service.py", line 405, in handler
processed_input = custom_input_handler(data, context)
File "/opt/ml/model/code/inference.py", line 65, in input_handler
deserialized_heart_data = read_latest_file(user_heart_rate_uri)
File "/opt/ml/model/code/inference.py", line 133, in read_latest_file
s3_client = boto3.client("s3")
File "/usr/local/lib/python3.9/site-packages/boto3/__init__.py", line 92, in client
return _get_default_session().client(*args, **kwargs)
File "/usr/local/lib/python3.9/site-packages/boto3/session.py", line 299, in client
return self._session.create_client(
File "/usr/local/lib/python3.9/site-packages/botocore/session.py", line 951, in create_client
credentials = self.get_credentials()
File "/usr/local/lib/python3.9/site-packages/botocore/session.py", line 507, in get_credentials
self._credentials = self._components.get_component(
File "/usr/local/lib/python3.9/site-packages/botocore/session.py", line 1108, in get_component
self._components[name] = factory()
File "/usr/local/lib/python3.9/site-packages/botocore/session.py", line 186, in _create_credential_resolver
return botocore.credentials.create_credential_resolver(
File "/usr/local/lib/python3.9/site-packages/botocore/credentials.py", line 92, in create_credential_resolver
container_provider = ContainerProvider()
File "/usr/local/lib/python3.9/site-packages/botocore/credentials.py", line 1894, in __init__
fetcher = ContainerMetadataFetcher()
File "/usr/local/lib/python3.9/site-packages/botocore/utils.py", line 2846, in __init__
session = botocore.httpsession.URLLib3Session(
File "/usr/local/lib/python3.9/site-packages/botocore/httpsession.py", line 313, in __init__
self._manager = PoolManager(**self._get_pool_manager_kwargs())
File "/usr/local/lib/python3.9/site-packages/botocore/httpsession.py", line 331, in _get_pool_manager_kwargs
'ssl_context': self._get_ssl_context(),
File "/usr/local/lib/python3.9/site-packages/botocore/httpsession.py", line 340, in _get_ssl_context
return create_urllib3_context()
File "/usr/local/lib/python3.9/site-packages/botocore/httpsession.py", line 129, in create_urllib3_context
context.options |= options
File "/usr/local/lib/python3.9/ssl.py", line 602, in options
super(SSLContext, SSLContext).options.__set__(self, value)
File "/usr/local/lib/python3.9/ssl.py", line 602, in options
super(SSLContext, SSLContext).options.__set__(self, value)
File "/usr/local/lib/python3.9/ssl.py", line 602, in options
super(SSLContext, SSLContext).options.__set__(self, value)
[Previous line repeated 476 more times]
I'm expecting my s3 client to get initialised so that I can access s3 resources from within my endpoint to perform inference.
Things I've tried:
Moving s3_client = boto3.client("s3")
to outside the input_handler, and right after all the imports in inference.py. This continued to give me the same SSL issue
Checking the permissions of the role my SageMaker Endpoint has. The role has full SageMaker access, which includes access to s3 resources. When creating the endpoint from a SageMaker Studio notebook, I used get_execution_role() to get the role, and I passed that in as the role parameter
Checking if the endpoint was created in a VPC. When I go to VPC on the AWS Console, under 'Endpoints', it says there are no endpoints.
Consulting GPT-4. GPT-4 thinks it's a low level networking issue and I should contact AWS. But I doubt it and that's what GPT thought about another issue I had in the past but it wasn't that low-level and difficult.
So finally my question, Why is calling boto3.client("s3") in the inference script of my SageMaker endpoint giving me a maximum recursion depth error, that is seemingly stemming from some SSL issue
Upvotes: 2
Views: 893
Reputation: 2123
This error can be caused by gevent
, used either directly or indirectly (by another library).
If it is gevent
, the solution would be to add
import gevent.monkey
gevent.monkey.patch_all()
before any boto3
/requests
are imported.
Related questions/links:
Upvotes: 0