Reputation: 11
I'm writing a script that needs to change models sometimes (diffusers & llama-cpp-python) I don't have much RAM and VRAM, so I need to clean RAM and VRAM after using a model.
llama is fine, I just use
del(llama_object)
gc.collect()
That's enough and the RAM is cleared. But diffusers doesn't want to give the same result.
pipe = StableDiffusionPipeline.from_pretrained(...)
#at this point i have 3.4GB usage
del(pipe)
gc.collect()
torch.cuda.empty_cache()
#at this point i have 3GB usage
I don't have a powerful video card, so I'm using
pipe.enable_sequential_cpu_offload()
I read some articles by some people who have encountered this and realized that only VRAM is cleared.
I don't know if this is necessary, but: ram: 32GB graphics card: GTX 960 4GB os: manjaro linux
any suggestions?
I tried a lot of things, in one of the articles I found cleaning vram with numba
from numba import cuda
cuda.select_device(0)
cuda.close()
but I didn't get the expected result. I also tried deleting different objects, but no result.
Upvotes: 1
Views: 792
Reputation: 11
So, the only thing I came up with is to create aka model.py and call it in a subprocess with parameters to run a specific model, and communicate with it via socket. when changing the model, just close the subprocess and open a new one.
Upvotes: 0