ORPOTrainer Error: Calculated loss must be on the original device: cuda:0 but device in use is cuda:3

Question

I am trying to train Phi3 with an ORPO dataset using the ORPOTrainer from the HuggingFace Transformers library. My machine has 4 GPUs, so I would like to start multi-GPU training. This is my ORPOCONFIG:

orpo_args = ORPOConfig(
    learning_rate=0.00003,
    beta=0.1,
    lr_scheduler_type="linear",
    max_length=2048,
    max_prompt_length=2048,
    per_device_train_batch_size=8,
    per_device_eval_batch_size=16,
    gradient_accumulation_steps=4,
    optim="paged_adamw_8bit",
    num_train_epochs=3,
    evaluation_strategy="steps",
    eval_steps=200,
    bf16=True,
    logging_steps=1,
    save_steps=500,
    warmup_steps=100,
    report_to="wandb",
    output_dir="./results/",
    remove_unused_columns=False,
    dataset_num_proc=os.cpu_count(),

)

and this is the trainer:

trainer = ORPOTrainer(
    model=model,
    args=orpo_args,
    train_dataset=formatted_orpo_dataset["train"],
    eval_dataset=formatted_orpo_dataset["test"],
    peft_config=peft_config,
    tokenizer=tokenizer,

)

The model was downloaded with 'device' set to 'auto', but I am getting this error here when trainer starts: "Calculated loss must be on the original device: cuda:0 but device in use is cuda:3"

Has anyone else encountered this issue and resolved it?

Thank you.

I tried to start ORPOTrainer but i have this error: "Calculated loss must be on the original device: cuda:0 but device in use is cuda:3".

ORPOTrainer Error: Calculated loss must be on the original device: cuda:0 but device in use is cuda:3

Answers (1)

Related Questions