Celi28
Celi28

Reputation: 111

Embedding HuggingFaceModel on SageMaker with Llamaindex return 500

While trying to implement an HuggingFace Embedding using LLamaindex on Sagemaker, Sagemaker Endpoint return a Worker Died (error500).If some had the same error, or any idea ?

The documents (txt files) are read by SimpleDirectoryReader. Vector store is QdrantVectorStore, datastore is MongoDocumentStore, index is MongoIndexStore.

Related code is :

# Hub Model configuration. <https://huggingface.co/models>
hub = {
    'HF_MODEL_ID':'BAAI/bge-base-en-v1.5', # model_id from hf.co/models
    'HF_TASK':'feature-extraction', # NLP task you want to use for predictions
    'MAX_INPUT_LENGTH': dumps(1024),  # Max length of input text
    'MAX_TOTAL_TOKENS': dumps(2048),  # Max length of the generation (including input text)
    # 'SM_NUM_GPUS': '1',
}

# Configure the HF container
huggingface_model = HuggingFaceModel(
    env=hub, # configuration for loading model from Hub
    role=role, # iam role with permissions to create an Endpoint
    py_version='py310',
    transformers_version="4.37.0", # transformers version used
    pytorch_version="2.1.0", # pytorch version used
)

# Deploy HF embedding
embedding_predictor = huggingface_model.deploy(
    endpoint_name=name_from_base("bge-base-en-v15"),
    initial_instance_count=1,
    instance_type="ml.c6i.xlarge"
)

# Declare embedding_model
embedding_model = SageMakerEmbedding(
    endpoint_name=embedding_predictor.endpoint_name,
    aws_access_key_id=AWS_ACCESS_KEY_ID,
    aws_secret_access_key=AWS_SECRET_ACCESS_KEY,
    region_name=AWS_REGION
)
# Ingestion Pipeline Code
pipeline = IngestionPipeline(
        transformations=[
            SentenceSplitter(chunk_size=250, chunk_overlap=50),
            Settings.embed_model
        ],
        vector_store=vector_store,
        docstore=doc_store,
    )
nodes = pipeline.run(documents=documents)
WorkerThread - 9000-52f284e8 Worker disconnected. WORKER_MODEL_LOADED                                                                           |
WorkerLifeCycle - Frontend disconnected.                                                                                              |
WorkerLifeCycle - Backend worker process died.                                                                                        |
WorkerLifeCycle - Traceback (most recent call last):                                                                                  |
WorkerLifeCycle -   File "/opt/conda/lib/python3.10/site-packages/mms/model_service_worker.py", line 175, in start_worker             |
WorkerLifeCycle -     self.handle_connection(cl_socket)                                                                               |
WorkerLifeCycle -   File "/opt/conda/lib/python3.10/site-packages/mms/model_service_worker.py", line 139, in handle_connection        |
WorkerLifeCycle -     cmd, msg = retrieve_msg(cl_socket)                                                                              |
WorkerLifeCycle -   File "/opt/conda/lib/python3.10/site-packages/mms/protocol/otf_message_handler.py", line 36, in retrieve_ms
WorkerLifeCycle -     cmd = _retrieve_buffer(conn, 1)
WorkerThread - Unknown exception                                                                                                                |
io.netty.handler.codec.CorruptedFrameException: Message size exceed limit: 35054253     

Upvotes: 0

Views: 67

Answers (1)

Celi28
Celi28

Reputation: 111

Issue fixed with increasing response size:

hub = {
    'HF_MODEL_ID':'BAAI/bge-base-en-v1.5', # model_id from hf.co/models
    'HF_TASK':'feature-extraction', # NLP task you want to use for predictions
    'SAGEMAKER_MODEL_SERVER_TIMEOUT':'3600', 
    'TS_MAX_RESPONSE_SIZE':'2000000000',
    'TS_MAX_REQUEST_SIZE':'2000000000',
    'MMS_MAX_RESPONSE_SIZE':'2000000000',
    'MMS_MAX_REQUEST_SIZE':'2000000000',
}

Upvotes: 0

Related Questions