Problem determining cuda/GPU as device for generator of LLM, always going back to CPU

Question

Background: I am trying to finetune Microsoft's Phi-2 model which is a 2.5 billion parameter LLM published on HuggingFace with instruction tuning with a little over 2000 quotes. I want to create a recognizable output change due to the isntruction tuning. I want to later extract word embeddings from the base model and my finetuned model to compare them. I am working in a jupyter notebook with VS Code in a virtual environment and have access to a server with enough capacity to handle LLMs. I already sucessfulyy tokenized the data to feed into the model, loaded and tested the basic model and moved all to cuda/GPU defined as 'device' HOWEVER, MY PROBLEM: when I try to feed the tokenized training and evalaution datasets into the model for training I get the following error message indicating that the generator is on cpu and not cuda as far as I understand: ERROR MESSAGE

RuntimeError Traceback (most recent call last) Cell In[51], line 37 11 trainer = transformers.Trainer( 12 model=model, 13 train_dataset=tokenized_train_dataset, (...) 33 data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False), 34 ) 36 model.config.use_cache = False # silence the warnings. Please re-enable for inference! ---> 37 trainer.train()

File ~/.venv/lib/python3.10/site-packages/transformers/trainer.py:1780, in Trainer.train(self, resume_from_checkpoint, trial, ignore_keys_for_eval, **kwargs) 1778 hf_hub_utils.enable_progress_bars() 1779 else: -> 1780 return inner_training_loop( 1781 args=args, 1782 resume_from_checkpoint=resume_from_checkpoint, 1783 trial=trial, 1784 ignore_keys_for_eval=ignore_keys_for_eval, 1785 )

File ~/.venv/lib/python3.10/site-packages/transformers/trainer.py:2085, in Trainer._inner_training_loop(self, batch_size, args, resume_from_checkpoint, trial, ignore_keys_for_eval) 2082 rng_to_sync = True 2084 step = -1 -> 2085 for step, inputs in enumerate(epoch_iterator): 2086 total_batched_samples += 1 2088 if self.args.include_num_input_tokens_seen:

File ~/.venv/lib/python3.10/site-packages/accelerate/data_loader.py:452, in DataLoaderShard.iter(self) 450 # We iterate one batch ahead to check when we are at the end 451 try: --> 452 current_batch = next(dataloader_iter) 453 except StopIteration: 454 yield

File ~/.venv/lib/python3.10/site-packages/torch/utils/data/dataloader.py:631, in BaseDataLoaderIter._next(self) 628 if self._sampler_iter is None: 629 # TODO(https://github.com/pytorch/pytorch/issues/76750) 630 self._reset() # type: ignore[call-arg] --> 631 data = self._next_data() 632 self._num_yielded += 1 633 if self._dataset_kind == _DatasetKind.Iterable and
634 self._IterableDataset_len_called is not None and
635 self._num_yielded > self._IterableDataset_len_called:

File ~/.venv/lib/python3.10/site-packages/torch/utils/data/dataloader.py:674, in _SingleProcessDataLoaderIter._next_data(self) 673 def _next_data(self): --> 674 index = self._next_index() # may raise StopIteration 675 data = self._dataset_fetcher.fetch(index) # may raise StopIteration 676 if self._pin_memory:

File ~/.venv/lib/python3.10/site-packages/torch/utils/data/dataloader.py:621, in _BaseDataLoaderIter._next_index(self) 620 def _next_index(self): --> 621 return next(self._sampler_iter)

File ~/.venv/lib/python3.10/site-packages/torch/utils/data/sampler.py:287, in BatchSampler.iter(self) 285 batch = [0] * self.batch_size 286 idx_in_batch = 0 --> 287 for idx in self.sampler: 288 batch[idx_in_batch] = idx 289 idx_in_batch += 1

File ~/.venv/lib/python3.10/site-packages/accelerate/data_loader.py:92, in SeedableRandomSampler.iter(self) 90 # print("Setting seed at epoch", self.epoch, seed) 91 self.generator.manual_seed(seed) ---> 92 yield from super().iter() 93 self.set_epoch(self.epoch + 1)

File ~/.venv/lib/python3.10/site-packages/torch/utils/data/sampler.py:167, in RandomSampler.iter(self) 165 else: 166 for _ in range(self.num_samples // n): --> 167 yield from torch.randperm(n, generator=generator).tolist() 168 yield from torch.randperm(n, generator=generator).tolist()[:self.num_samples % n]

File ~/.venv/lib/python3.10/site-packages/torch/utils/device.py:77, in DeviceContext._torch_function(self, func, types, args, kwargs) 75 if func in _device_constructors() and kwargs.get('device') is None: 76 kwargs['device'] = self.device ---> 77 return func(args, *kwargs)

RuntimeError: Expected a 'cuda' device type for generator but found 'cpu'

Additonally, I get he following warning:

Using the WANDB_DISABLED environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none). ./.venv/lib/python3.10/site-packages/accelerate/accelerator.py:432: FutureWarning: Passing the following arguments to Accelerator is deprecated and will be removed in version 1.0 of Accelerate: dict_keys(['dispatch_batches', 'split_batches', 'even_batches', 'use_seedable_sampler']). Please pass an accelerate.DataLoaderConfiguration instead: dataloader_config = DataLoaderConfiguration(dispatch_batches=None, split_batches=False, even_batches=True, use_seedable_sampler=True) warnings.warn(

The code parts from my jupyter notebook relevant are the training:


#import wandb
import transformers
from datetime import datetime
import torch
torch.set_default_device("cuda")
project = "ideollm"
base_model_name = "phi2"
run_name = base_model_name + "-" + project
output_dir = "./" + run_name

trainer = transformers.Trainer(
model=model,
train_dataset=tokenized_train_dataset,
eval_dataset=tokenized_val_dataset,
args=transformers.TrainingArguments(
output_dir=output_dir,
warmup_steps=1,
per_device_train_batch_size=2,
gradient_accumulation_steps=1,
max_steps=500,
learning_rate=2.5e-5, # Want a small lr for finetuning
optim="paged_adamw_8bit",
logging_steps=25,              # When to start reporting loss
logging_dir="./logs",        # Directory for storing logs
save_strategy="steps",       # Save the model checkpoint every logging step
save_steps=25,                # Save checkpoints every 50 steps
evaluation_strategy="steps", # Evaluate the model every logging step
eval_steps=25,               # Evaluate and save checkpoints every 50 steps
do_eval=True,                # Perform evaluation at the end of training
#report_to="wandb",  
#run_name=f"{run_name}-{datetime.now().strftime('%Y-%m-%d-%H-%M')}"  
),
data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False),
)

model.config.use_cache = False  # silence the warnings. Please re-enable for inference!
trainer.train()

AND BEFORE THE TOKENIZING:


import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

for i, tokens in tokenized_train_dataset.items():

        output_ids = model.generate(tokenized_train_dataset[i].cuda(), do_sample=True, max_new_tokens=270, early_stopping=True,)
        output = tokenizer.batch_decode(output_ids)
        print(output)

I tried:

Commenting out WANDB because it threw errors -no more error for this but the overall still does not run
Changing the tokenizer
Intentionally setting model and dataset to cuda
checking availability of cuda/GPU
changing from google colab to VS Code and server

I get the same error over and over even though I tried to change the device and the data processing. The model does not train at all.

Problem determining cuda/GPU as device for generator of LLM, always going back to CPU

Answers (1)

Related Questions