How to load a huggingface pretrained transformer model directly to GPU?

Question

I want to load a huggingface pretrained transformer model directly to GPU (not enough CPU space) e.g. loading BERT

from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("bert-base-uncased")

would be loaded to CPU until executing

model.to('cuda')

now the model is loaded into GPU

I want to load the model directly into GPU when executing from_pretrained. Is this possible?

cookiemonster · Accepted Answer

I'm answering my own question.

Hugging Face accelerate (add via pip install accelerate) could be helpful in moving the model to GPU before it's fully loaded in CPU. It's useful when:

GPU memory > model size > CPU memory

Also specify device_map="cuda":

from transformers import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained("bert-base-uncased", device_map="cuda")

How to load a huggingface pretrained transformer model directly to GPU?

Answers (1)

Related Questions