Reputation: 1
After running this code within a jupyter notebook, it runs properly. However, the memory is still stored in the GPU. How do I get rid of this memory to clear up space on my GPU. Sorry if I am formatting this question poorly, I am not used to posting. Provided is the code:
llm = LLM(
model=model_path,
gpu_memory_utilization=0.7,
max_model_len=2048,
)
llm = LLM(model=model_path, dtype=torch.bfloat16, trust_remote_code=True, max_model_len=2048, quantization="bitsandbytes", load_format="bitsandbytes", gpu_memory_utilization = 0.8)
I tried deleting llm and clearing cache which decreases the allocated and chached memory, but I cannot rerun the LLM method as I get an OOM Error (the previous call still has stored memory).
Upvotes: 0
Views: 85
Reputation: 211
I regularly use the holy trinity of cleanup with pytorch
del
torch.cuda.empty_cache()
gc.collect()
(import gc at top of script)del llm
torch.cuda.empty_cache()
gc.collect()
That said Jupyter notebooks are weird, you may just have to restart the kernel if these don't work since Jupyter has its own cachine mechanisms.
Upvotes: 0
Reputation: 383
Well, how about killing the vllm related process using pkill -9 -ef <part or whole of the vllm process name or cli command>
? You can check the vllm process consuming GPU RAM with nvidia-smi
, nvitop
or nvtop
.
Upvotes: 0