HuggingFace: Loading checkpoint shards taking too long

Question

Hi guys I am running the following code:

model = AutoModelForCausalLM.from_pretrained('./cache/model')

tokenizer = AutoTokenizer.from_pretrained('./cache/model')

where I have cached a hugging face model using cache_dir within the from_pretraind() method. However, everytime I load the model it requires to load the checkpoint shards which takes 7-10 minutes for each inference.

Loading checkpoint shards: 67%|######6 | 2/3 [06:17<03:08, 188.79s/it]

This is taking so long even though I am loading the model locally where it is already installed?

I am using some powerful GPUs so my actual inference is just a few seconds but the time it takes to load the model into memory is so long.

Is there any way around this? I saw someone on a similar thread say they used safe_serialization when using the save_pretrained() method but my issue is I am loading a pretrained model and not fine-tuning and saving my own. Hence, I am unsure how to apply this plausible solution.

Any help here would be great.

HuggingFace: Loading checkpoint shards taking too long

Answers (1)

Related Questions