Reputation: 1
I am trying to train Phi3 with an ORPO dataset using the ORPOTrainer from the HuggingFace Transformers library. My machine has 4 GPUs, so I would like to start multi-GPU training. This is my ORPOCONFIG:
orpo_args = ORPOConfig(
learning_rate=0.00003,
beta=0.1,
lr_scheduler_type="linear",
max_length=2048,
max_prompt_length=2048,
per_device_train_batch_size=8,
per_device_eval_batch_size=16,
gradient_accumulation_steps=4,
optim="paged_adamw_8bit",
num_train_epochs=3,
evaluation_strategy="steps",
eval_steps=200,
bf16=True,
logging_steps=1,
save_steps=500,
warmup_steps=100,
report_to="wandb",
output_dir="./results/",
remove_unused_columns=False,
dataset_num_proc=os.cpu_count(),
)
and this is the trainer:
trainer = ORPOTrainer(
model=model,
args=orpo_args,
train_dataset=formatted_orpo_dataset["train"],
eval_dataset=formatted_orpo_dataset["test"],
peft_config=peft_config,
tokenizer=tokenizer,
)
The model was downloaded with 'device' set to 'auto', but I am getting this error here when trainer starts: "Calculated loss must be on the original device: cuda:0 but device in use is cuda:3"
Has anyone else encountered this issue and resolved it?
Thank you.
I tried to start ORPOTrainer but i have this error: "Calculated loss must be on the original device: cuda:0 but device in use is cuda:3".
Upvotes: 0
Views: 387
Reputation: 130
I had the same problem and found a solution here. I now set the device_map using the accelerator rather than "auto":
from accelerate import Accelerator
accelerator = Accelerator()
[...]
model = AutoModelForCausalLM.from_pretrained(
base_model,
quantization_config=bnb_config,
device_map = {"": accelerator.process_index},
attn_implementation=attn_implementation
)
Upvotes: 0