SJay_747
SJay_747

Reputation: 31

HuggingFace: Loading checkpoint shards taking too long

Hi guys I am running the following code:

model = AutoModelForCausalLM.from_pretrained('./cache/model')

tokenizer = AutoTokenizer.from_pretrained('./cache/model')

where I have cached a hugging face model using cache_dir within the from_pretraind() method. However, everytime I load the model it requires to load the checkpoint shards which takes 7-10 minutes for each inference.

Loading checkpoint shards: 67%|######6 | 2/3 [06:17<03:08, 188.79s/it]

This is taking so long even though I am loading the model locally where it is already installed?

I am using some powerful GPUs so my actual inference is just a few seconds but the time it takes to load the model into memory is so long.

Is there any way around this? I saw someone on a similar thread say they used safe_serialization when using the save_pretrained() method but my issue is I am loading a pretrained model and not fine-tuning and saving my own. Hence, I am unsure how to apply this plausible solution.

Any help here would be great.

Upvotes: 2

Views: 3653

Answers (1)

Marco Maccarini
Marco Maccarini

Reputation: 1

remove cache folder

sudo rm -r .cache/huggingface/

Upvotes: -3

Related Questions