Zachary Schwehr
Zachary Schwehr

Reputation: 1

Remove GPU Ram with VLLM

After running this code within a jupyter notebook, it runs properly. However, the memory is still stored in the GPU. How do I get rid of this memory to clear up space on my GPU. Sorry if I am formatting this question poorly, I am not used to posting. Provided is the code:

llm = LLM(
  model=model_path, 
  gpu_memory_utilization=0.7, 
  max_model_len=2048,
)

llm = LLM(model=model_path, dtype=torch.bfloat16, trust_remote_code=True, max_model_len=2048, quantization="bitsandbytes", load_format="bitsandbytes", gpu_memory_utilization = 0.8)

I tried deleting llm and clearing cache which decreases the allocated and chached memory, but I cannot rerun the LLM method as I get an OOM Error (the previous call still has stored memory).

Upvotes: 0

Views: 85

Answers (2)

Brandon Pardi
Brandon Pardi

Reputation: 211

I regularly use the holy trinity of cleanup with pytorch

  1. Delete model object with Python del
  2. Empty cache with torch.cuda.empty_cache()
  3. Python garbage collection gc.collect() (import gc at top of script)
del llm
torch.cuda.empty_cache()
gc.collect()

That said Jupyter notebooks are weird, you may just have to restart the kernel if these don't work since Jupyter has its own cachine mechanisms.

Upvotes: 0

heyzude
heyzude

Reputation: 383

Well, how about killing the vllm related process using pkill -9 -ef <part or whole of the vllm process name or cli command>? You can check the vllm process consuming GPU RAM with nvidia-smi, nvitop or nvtop.

Upvotes: 0

Related Questions