Reputation: 405
Code was based off on https://github.com/tloen/alpaca-lora/blob/main/finetune.py
My objective of this training was to made use of unsupervised training dataset to get the model to understand how words are written in my domain (basically masked language modelling). Reason i don't use the conventional instructional fine tuning is because there's no such dataset of sufficient quantity available to me.
The 2 main changes i've made are as follows
from peft import (
# LoraConfig,
PeftModel,
get_peft_model,
get_peft_model_state_dict,
prepare_model_for_int8_training,
set_peft_model_state_dict,
)
as well as
# config = LoraConfig(
# r=lora_r,
# lora_alpha=lora_alpha,
# target_modules=lora_target_modules,
# lora_dropout=lora_dropout,
# bias="none",
# task_type="CAUSAL_LM",
# )
# model = get_peft_model(model, config)
# replace with this to load directly from alpaca
LORA_WEIGHTS = "tloen/alpaca-lora-7b"
model = PeftModel.from_pretrained(
model,
LORA_WEIGHTS,
torch_dtype=torch.float16,
)
def chunk_text(data):
concantenated_text = ''
all_result = []
for i in range(data['train'].num_rows):
concantenated_text += data['train']['combined'][i]
tokenized_concantenated_text = tokenizer.encode(concantenated_text)[1:]
tokenized_prompt = tokenizer.encode("### Text: ")[1:]
full_length = len(tokenized_concantenated_text)
for i in range(0, full_length, chunk_size):
text = tokenized_concantenated_text[i: i+chunk_size+overlap_size]
text = tokenized_prompt + text
text = tokenizer.decode(text)
result = tokenizer(text, padding=False)
if result["input_ids"][-1] != tokenizer.eos_token_id:
result["input_ids"].append(tokenizer.eos_token_id)
result["attention_mask"].append(1)
result["labels"] = result["input_ids"].copy()
all_result.append(result)
return all_result
However, i keep facing the following error no matter how i tweak the code. Really appreciate any help rendered!
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ in <cell line: 2>:2 │
│ │
│ /usr/local/lib/python3.9/dist-packages/transformers/trainer.py:1662 in train │
│ │
│ 1659 │ │ inner_training_loop = find_executable_batch_size( │
│ 1660 │ │ │ self._inner_training_loop, self._train_batch_size, args.auto_find_batch_size │
│ 1661 │ │ ) │
│ ❱ 1662 │ │ return inner_training_loop( │
│ 1663 │ │ │ args=args, │
│ 1664 │ │ │ resume_from_checkpoint=resume_from_checkpoint, │
│ 1665 │ │ │ trial=trial, │
│ │
│ /usr/local/lib/python3.9/dist-packages/transformers/trainer.py:1991 in _inner_training_loop │
│ │
│ 1988 │ │ │ │ │ │ │ xm.optimizer_step(self.optimizer) │
│ 1989 │ │ │ │ │ elif self.do_grad_scaling: │
│ 1990 │ │ │ │ │ │ scale_before = self.scaler.get_scale() │
│ ❱ 1991 │ │ │ │ │ │ self.scaler.step(self.optimizer) │
│ 1992 │ │ │ │ │ │ self.scaler.update() │
│ 1993 │ │ │ │ │ │ scale_after = self.scaler.get_scale() │
│ 1994 │ │ │ │ │ │ optimizer_was_run = scale_before <= scale_after │
│ │
│ /usr/local/lib/python3.9/dist-packages/torch/cuda/amp/grad_scaler.py:368 in step │
│ │
│ 365 │ │ if optimizer_state["stage"] is OptState.READY: │
│ 366 │ │ │ self.unscale_(optimizer) │
│ 367 │ │ │
│ ❱ 368 │ │ assert len(optimizer_state["found_inf_per_device"]) > 0, "No inf checks were rec │
│ 369 │ │ │
│ 370 │ │ retval = self._maybe_opt_step(optimizer, optimizer_state, *args, **kwargs) │
│ 371 │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
AssertionError: No inf checks were recorded for this optimizer.
Environment: Python: 3.9 cuda: 11.8
Upvotes: 6
Views: 10372
Reputation: 21
I removed --mixed_precision="fp16" from training script and it removed my error.
Upvotes: 2
Reputation: 1575
This does not directly answer your question but rather provides an alternative recommended way of achieving the same objective. Alternately, you can try to export the Llama+Loraweights as huggingface checkpoint locally using export_hf_checkpoint.py
file. This saved checkpoint becomes the new base model and you can tune a new LoRA model on top of this new base model.
Upvotes: 0
Reputation: 365
For anybody struggling with this issue because they are loading a pretrained config, there may be a flag for inference_mode
which needs to be changed:
LORA_WEIGHTS = 'tloen/alpaca-lora-7b'
model = PeftModel.from_pretrained(model, LORA_WEIGHTS, dtype=torch.float16)
config = LoraConfig.from_pretrained(LORA_WEIGHTS)
config.inference_mode = False
model = get_peft_model(model, config)
With inference_mode
set to True
you are freezing the adapter (source code):
if self.peft_config[adapter_name].inference_mode:
_freeze_adapter(self.model, adapter_name)
Upvotes: 4
Reputation: 1
You shouldn't comment the config, and initialize the model = get_peft_model(model, config)
after model = PeftModel.from_pretrained(model, LORA_WEIGHTS, torch_dtype=torch.float16)
Upvotes: 0