Ray error when trying to deploy Llama3 70b with VLLM with Vertex AI

Question

Using Vertex ai custom container online predictions, i'm trying to deploy:

meta-llama/Meta-Llama-3-70B-Instruct

with vllm 0.4.1 on 8 NVIDIA_L4 gpus and gettings:

/tmp/ray is over 95% full, available space: 5031063552; capacity: 101203873792. Object creation will fail if spilling is required.

this is the last log i see and after that deployment is failed with no apparent reason, it seems like Vertex restarts the container but eventually it fails (probably due to timeout)

running the custom container on a VM had no issues,

To create the model i'm using google aiplatfrom sdk:

model_resource = aiplatform.Model.upload(
    serving_container_image_uri=serving_container_image_uri,
    serving_container_shared_memory_size_mb=16384,
    ...
    )

and to load the model with vllm (code ran by the container):

from vllm import LLM
self.model = LLM(
    model=model_config.model_hf_name,
    dtype="auto",
    tensor_parallel_size=model_config.tensor_parallel_size,
    enforce_eager=model_config.enforce_eager,
    disable_custom_all_reduce=model_config.disable_custom_all_reduce,
    worker_use_ray=bool(model_config.tensor_parallel_size > 1),
    enable_prefix_caching=False,
    max_model_len=model_config.max_seq_len,
)

Ray error when trying to deploy Llama3 70b with VLLM with Vertex AI

Answers (1)

Related Questions