Reputation: 79
I have this code that init a class with a model and a tokenizer from Huggingface. On Google Colab this code works fine, it loads the model on the GPU memory without problems. On Google Cloud Platform it does not work, it loads the model on gpu, whatever I try.
class OPT:
def __init__(self, model_name: str = "facebook/opt-2.7b", use_gpu: bool = False):
self.model_name = model_name
self.use_gpu = use_gpu and torch.cuda.is_available()
print(f"Use gpu:: {self.use_gpu}")
if self.use_gpu:
print("Using gpu")
self.model = AutoModelForCausalLM.from_pretrained(
self.model_name, torch_dtype=torch.float16
).cuda()
else:
print("Using cpu")
self.model = AutoModelForCausalLM.from_pretrained(
self.model_name, torch_dtype=torch.float32, low_cpu_mem_usage=True
)
# the fast tokenizer currently does not work correctly
self.tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=False)
The printed output is correct:
Use gpu:: True
Using gpu
But the nvidia-smi says that there is no process running on the gpu:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.82.01 Driver Version: 470.82.01 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla T4 On | 00000000:00:04.0 Off | 0 |
| N/A 40C P8 9W / 70W | 0MiB / 15109MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
And with htop I can see that the process is using the cpu ram.
Upvotes: 5
Views: 8494
Reputation: 57
You should use the .to(device)
method like this:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = nameofyourmodel.to(device)
Upvotes: 5
Reputation: 46
Like with every PyTorch model, you need to put it on the GPU, as well as your batches of inputs using .to(device)
method.
https://discuss.huggingface.co/t/is-transformers-using-gpu-by-default/8500 https://github.com/huggingface/transformers/issues/2704
Upvotes: 0