Khaleel
Khaleel

Reputation: 1371

Loading checkpoint shards takes too long

I'm very new to generative AI. I have 64gb RAM and 20GB GPU. I used some opensource model from Huggingface and used Python to simply prompt it with out of box model and displaying the result. I downloaded the model to local using save_pretrained and trying to load the model from local there after. It works. But everytime I run the python file it takes more than 10 mins to display the results.

There is a step Loading checkpoint shards that takes 6-7 mins everytime. Am I doing anything wrong? why it has to load something everytime even though the model is refered from local.

I tried using local_files_only=True, cache_dir=cache_dir, low_cpu_mem_usage=True, max_shard_size="200MB" , none solved the time issue .

How to prompt the saved model directly without so much delay as user usable. Any help would be highly appreciated

Upvotes: 12

Views: 22308

Answers (2)

xie
xie

Reputation: 1

You can try this method: model = AutoModelForCausalLM.from_pretrained(model_path, device_map="auto", trust_remote_code=True).eval()

It may work.

Upvotes: -1

Luxin.Z
Luxin.Z

Reputation: 39

I had exactly the same problem and I fixed it by setting safe_serialization=True when using the save_pretrained() method. Hope this works for you. However, I do want to know what was going on when loading a model with a vanilla .bin format.

Upvotes: 3

Related Questions