Malhar
Malhar

Reputation: 21

AWS SageMaker Endpoint: Maximum Recursion Depth Exceeded Error When Calling boto3.client("s3")

I'm encountering a Maximum Recursion Depth Exceeded Error when I try calling .predict() on a deployed SageMaker Endpoint (for a TensorFlow model). I've put logging statements in my inference script, specifically in my input_handler function and the data reading/cleaning functions that the handler calls.

The logs suggest that everything is working until I call my first read_latest_file() function on user_heart_rate_uri. So the log "Called all user_data_uri functions" is printed in CloudWatch but "read_latest_file on heart rate called" is not printed. I've further put logging statements into the read_latest_file() and I think I know where the issue is occurring.

**Here is part of my input_handler function: **

def input_handler(data, context):

    logger.info("input_handler entered")

    if context.request_content_type == 'application/json': # should be context for testing

        logger.info(f"Raw data type: {type(data)}")
        
        d = data.read().decode('utf-8')
        logger.info(f"Decoded data type: {type(d)}")
        logger.info(f"d value: {d}")
            
        data_object = json.loads(d)
        data_object_dict_check = isinstance(data_object, dict)
        data_object_string_check = isinstance(data_object, str)

        if (not data_object_dict_check and data_object_string_check):
            logger.info("String and not dictionary")
            data_object = ast.literal_eval(data_object)

        data_object_type_check = type(data_object)
        
     #   logger.info(f"data_object value : {data_object}")
        logger.info(f"Deserialized data type check: {data_object_type_check}")

       # logger.info(f"data_object's type is {type(data_object)} and keys are {data_object.keys()}")

        try:
            user_data = data_object["user_data"]
            user_ids = data_object["user_ids"]
        except:
            logger.info(f"Except block, data_object value: {data_object}")

        logger.info("Get runs on data_object")
        logger.info(f"{user_data.keys()}")

        heart_rate_dictionary = {}  # {userid: {date1: val, date}}
        steps_dictionary = {}
        nonseq_dictionary = {}

        for user_id in user_ids:

          logger.info(f"Going through user: {user_id}")
          user_data_uri = user_data[user_id]  # this gives me a dictionary

          user_heart_rate_uri = user_data_uri["heart_rate"]
          user_step_count_uri = user_data_uri["step_count"]
          user_sleep_uri = user_data_uri["sleep"]
          user_demographic_uri = user_data_uri["demographic"]

          logger.info("Called all user_data_uri functions")

          deserialized_heart_data = read_latest_file(user_heart_rate_uri)
          logger.info("read_latest_file on heart rate called")
          deserialized_step_data = read_latest_file(user_step_count_uri)
          logger.info("read_latest_file on step data called")
          deserialized_sleep_data = read_latest_file(user_sleep_uri)
          logger.info("read_latest_file on sleep data called")
          deserialized_demographic_data = read_demographic(user_demographic_uri)
          logger.info("read_demographic called")

          logger.info("Called all read file functions")

Here is my read_latest_file() function:

def read_latest_file(folder_uri):
    logger.info("read_latest_file entered")
    s3_client = boto3.client("s3")
    logger.info("s3_client initialized")
    bucket_name = "nashs3bucket15927-dev"
    response = s3_client.list_objects_v2(Bucket=bucket_name, Prefix=folder_uri)
    logger.info("list_objects_v2 called")
    latest_file = max(response.get('Contents', []), key=lambda x: x['LastModified']) if 'Contents' in response else None
    logger.info("Latest file found")

    if latest_file:
        logger.info("latest file not empty")
      #  print("working")
        file_key = latest_file['Key']
        logger.info(f"file key: {file_key}")
        # Read the JSON file content from S3
        response = s3_client.get_object(Bucket=bucket_name, Key=file_key)
        logger.info("get_object called, response received")
        file_content = json.loads(response['Body'].read().decode('utf-8'))
        logger.info("file decoded and deserialized, file_content received")
        logger.info(f"length of file_content: {len(file_content)}")
        return file_content
    else:
        logger.info("latest file empty")
        return None

The log "read_latest_file entered" is being printed. However, "s3_client initialized" is not being printed. More CloudWatch logs suggest that calling boto3.client("s3") is giving me the error.

Relevant CloudWatch Logs about boto3.client("s3")

1596 2024-01-08 19:28:38,664 INFO read_latest_file entered

1596 2024-01-08 19:28:38,680 ERROR exception handling request: maximum recursion depth exceeded

Traceback (most recent call last):
  File "/sagemaker/python_service.py", line 373, in _handle_invocation_post
    res.body, res.content_type = handlers(data, context)
  File "/sagemaker/python_service.py", line 405, in handler
    processed_input = custom_input_handler(data, context)
  File "/opt/ml/model/code/inference.py", line 65, in input_handler
    deserialized_heart_data = read_latest_file(user_heart_rate_uri)
  File "/opt/ml/model/code/inference.py", line 133, in read_latest_file
    s3_client = boto3.client("s3")
  File "/usr/local/lib/python3.9/site-packages/boto3/__init__.py", line 92, in client
    return _get_default_session().client(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/boto3/session.py", line 299, in client
    return self._session.create_client(
  File "/usr/local/lib/python3.9/site-packages/botocore/session.py", line 951, in create_client
    credentials = self.get_credentials()
  File "/usr/local/lib/python3.9/site-packages/botocore/session.py", line 507, in get_credentials
    self._credentials = self._components.get_component(
  File "/usr/local/lib/python3.9/site-packages/botocore/session.py", line 1108, in get_component
    self._components[name] = factory()
  File "/usr/local/lib/python3.9/site-packages/botocore/session.py", line 186, in _create_credential_resolver
    return botocore.credentials.create_credential_resolver(
  File "/usr/local/lib/python3.9/site-packages/botocore/credentials.py", line 92, in create_credential_resolver
    container_provider = ContainerProvider()
  File "/usr/local/lib/python3.9/site-packages/botocore/credentials.py", line 1894, in __init__
    fetcher = ContainerMetadataFetcher()
  File "/usr/local/lib/python3.9/site-packages/botocore/utils.py", line 2846, in __init__
    session = botocore.httpsession.URLLib3Session(
  File "/usr/local/lib/python3.9/site-packages/botocore/httpsession.py", line 313, in __init__
    self._manager = PoolManager(**self._get_pool_manager_kwargs())
  File "/usr/local/lib/python3.9/site-packages/botocore/httpsession.py", line 331, in _get_pool_manager_kwargs
    'ssl_context': self._get_ssl_context(),
  File "/usr/local/lib/python3.9/site-packages/botocore/httpsession.py", line 340, in _get_ssl_context
    return create_urllib3_context()
  File "/usr/local/lib/python3.9/site-packages/botocore/httpsession.py", line 129, in create_urllib3_context
    context.options |= options
  File "/usr/local/lib/python3.9/ssl.py", line 602, in options
    super(SSLContext, SSLContext).options.__set__(self, value)
  File "/usr/local/lib/python3.9/ssl.py", line 602, in options
    super(SSLContext, SSLContext).options.__set__(self, value)
  File "/usr/local/lib/python3.9/ssl.py", line 602, in options
    super(SSLContext, SSLContext).options.__set__(self, value)
  [Previous line repeated 476 more times]

I'm expecting my s3 client to get initialised so that I can access s3 resources from within my endpoint to perform inference.

Things I've tried:

  1. Moving s3_client = boto3.client("s3") to outside the input_handler, and right after all the imports in inference.py. This continued to give me the same SSL issue

  2. Checking the permissions of the role my SageMaker Endpoint has. The role has full SageMaker access, which includes access to s3 resources. When creating the endpoint from a SageMaker Studio notebook, I used get_execution_role() to get the role, and I passed that in as the role parameter

  3. Checking if the endpoint was created in a VPC. When I go to VPC on the AWS Console, under 'Endpoints', it says there are no endpoints.

  4. Consulting GPT-4. GPT-4 thinks it's a low level networking issue and I should contact AWS. But I doubt it and that's what GPT thought about another issue I had in the past but it wasn't that low-level and difficult.

So finally my question, Why is calling boto3.client("s3") in the inference script of my SageMaker endpoint giving me a maximum recursion depth error, that is seemingly stemming from some SSL issue

Upvotes: 2

Views: 893

Answers (1)

Bert Blommers
Bert Blommers

Reputation: 2123

This error can be caused by gevent, used either directly or indirectly (by another library).

If it is gevent, the solution would be to add

import gevent.monkey
gevent.monkey.patch_all()

before any boto3/requests are imported.

Related questions/links:

  1. https://github.com/gevent/gevent/issues/941
  2. "RecursionError: maximum recursion depth exceeded" from ssl.py: `super(SSLContext, SSLContext).options.__set__(self, value)`

Upvotes: 0

Related Questions