Reputation: 1136
I wanted to train LLM with a custom library with tons of functions. Intent is to able to generate code given LLM trained on this custom library. As the library is pretty huge I run out of tokens when adding it to context of LLM (I am using gpt-4-32k).
Upvotes: 1
Views: 784
Reputation: 502
If you're looking to achieve high-quality generations without spending millions to fine-tune a model, consider using PEFT (Parameter-Efficient Fine-Tuning).
PEFT, an open-source library from Hugging Face, enables the fine-tuning of pre-trained language models (PLMs) without needing to modify all the model's parameters. PEFT currently includes techniques for QLoRA, LoRa, P-Tuning, etc.
Let's go more details about PEFT and QLoRA:
PEFT or Parameter-Efficient Fine-Tuning (from Hugging Face) is an open-source library that helps you to fine-tune a pre-trained language model (PLMs) without modifying all the model's parameters.
QLoRA on the other hand, efficient fine-tuning technique, that first quantizes a PLM to 4 bits (Quantization) and attaches "Low-Rank Adapters" (LoRA, PEFT). This enables you to fine-tuning models with tens of billion parameters on a single GPU.
By integrating these techniques, you can fine-tune large models more efficiently and effectively, making the most of your computational resources.
--
I'm not sure about your batch size, but you can even run the fine-tuned LLM on your device easily. You can do speculative sampling to increase token generation speed while still keeping the generation quality at the SOTA level.
--
To improve gen quality, you can even do RL. You can first provide a chunk from a code, and ask for completion, and backprop to adjust weights.
Upvotes: 2