cookiemonster
cookiemonster

Reputation: 2134

How to load a huggingface pretrained transformer model directly to GPU?

I want to load a huggingface pretrained transformer model directly to GPU (not enough CPU space) e.g. loading BERT

from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("bert-base-uncased")

would be loaded to CPU until executing

model.to('cuda')

now the model is loaded into GPU

I want to load the model directly into GPU when executing from_pretrained. Is this possible?

Upvotes: 18

Views: 54503

Answers (1)

cookiemonster
cookiemonster

Reputation: 2134

I'm answering my own question.

Hugging Face accelerate (add via pip install accelerate) could be helpful in moving the model to GPU before it's fully loaded in CPU. It's useful when:

GPU memory > model size > CPU memory

Also specify device_map="cuda":

from transformers import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained("bert-base-uncased", device_map="cuda")

Upvotes: 30

Related Questions