Reputation: 111
While trying to implement an HuggingFace Embedding using LLamaindex on Sagemaker, Sagemaker Endpoint return a Worker Died (error500).If some had the same error, or any idea ?
The documents (txt files) are read by SimpleDirectoryReader. Vector store is QdrantVectorStore, datastore is MongoDocumentStore, index is MongoIndexStore.
Related code is :
# Hub Model configuration. <https://huggingface.co/models>
hub = {
'HF_MODEL_ID':'BAAI/bge-base-en-v1.5', # model_id from hf.co/models
'HF_TASK':'feature-extraction', # NLP task you want to use for predictions
'MAX_INPUT_LENGTH': dumps(1024), # Max length of input text
'MAX_TOTAL_TOKENS': dumps(2048), # Max length of the generation (including input text)
# 'SM_NUM_GPUS': '1',
}
# Configure the HF container
huggingface_model = HuggingFaceModel(
env=hub, # configuration for loading model from Hub
role=role, # iam role with permissions to create an Endpoint
py_version='py310',
transformers_version="4.37.0", # transformers version used
pytorch_version="2.1.0", # pytorch version used
)
# Deploy HF embedding
embedding_predictor = huggingface_model.deploy(
endpoint_name=name_from_base("bge-base-en-v15"),
initial_instance_count=1,
instance_type="ml.c6i.xlarge"
)
# Declare embedding_model
embedding_model = SageMakerEmbedding(
endpoint_name=embedding_predictor.endpoint_name,
aws_access_key_id=AWS_ACCESS_KEY_ID,
aws_secret_access_key=AWS_SECRET_ACCESS_KEY,
region_name=AWS_REGION
)
# Ingestion Pipeline Code
pipeline = IngestionPipeline(
transformations=[
SentenceSplitter(chunk_size=250, chunk_overlap=50),
Settings.embed_model
],
vector_store=vector_store,
docstore=doc_store,
)
nodes = pipeline.run(documents=documents)
WorkerThread - 9000-52f284e8 Worker disconnected. WORKER_MODEL_LOADED |
WorkerLifeCycle - Frontend disconnected. |
WorkerLifeCycle - Backend worker process died. |
WorkerLifeCycle - Traceback (most recent call last): |
WorkerLifeCycle - File "/opt/conda/lib/python3.10/site-packages/mms/model_service_worker.py", line 175, in start_worker |
WorkerLifeCycle - self.handle_connection(cl_socket) |
WorkerLifeCycle - File "/opt/conda/lib/python3.10/site-packages/mms/model_service_worker.py", line 139, in handle_connection |
WorkerLifeCycle - cmd, msg = retrieve_msg(cl_socket) |
WorkerLifeCycle - File "/opt/conda/lib/python3.10/site-packages/mms/protocol/otf_message_handler.py", line 36, in retrieve_ms
WorkerLifeCycle - cmd = _retrieve_buffer(conn, 1)
WorkerThread - Unknown exception |
io.netty.handler.codec.CorruptedFrameException: Message size exceed limit: 35054253
Upvotes: 0
Views: 67
Reputation: 111
Issue fixed with increasing response size:
hub = {
'HF_MODEL_ID':'BAAI/bge-base-en-v1.5', # model_id from hf.co/models
'HF_TASK':'feature-extraction', # NLP task you want to use for predictions
'SAGEMAKER_MODEL_SERVER_TIMEOUT':'3600',
'TS_MAX_RESPONSE_SIZE':'2000000000',
'TS_MAX_REQUEST_SIZE':'2000000000',
'MMS_MAX_RESPONSE_SIZE':'2000000000',
'MMS_MAX_REQUEST_SIZE':'2000000000',
}
Upvotes: 0