Finetuning a LM vs prompt-engineering an LLM

Question

Is it possible to finetune a much smaller language model like Roberta on say, a customer service dataset and get results as good as one might get with prompting GPT-4 with parts of the dataset?

Can a fine-tuned Roberta model learn to follow instructions in a conversational manner at least for a small domain like this?

Is there any paper or article that explores this issue empirically that I can check out?

Tolu · Accepted Answer

I found a medium piece which goes a long way in clarifying this here.

Quoting from the conclusion in the above,

In the low data domain, prompting shows superior performance to the respective fine-tuning method. To beat the SOTA benchmarks in fine-tuning, leveraging large frozen language models in combination with tuning a soft prompt seems to be the way forward.

It appears prompting an LLM may outperform fine tuning a smaller model on domain-specific tasks if the training data is small and vice versa if otherwise.

Additionally, in my own personal anecdotal experience with ChatGPT, Bard, Bing, Vicuna-3b, Dolly-v2-12b and Illama-13b, it appears models of the size of ChatGPT, Bard and Bing have learned to mimic human understanding of language well enough to be able to extract meaningful answers from context provided at inference time. It seems to me the smaller models do not have that mimicry-mastery and might not perform as well with in-context learning at inference time. They might also be too large to be well suited for fine-tuning in a very limited domain. My hunch is that for very limited domains, if one is going the fine-tuning route, fine-tuning on much smaller models like BERT or Roberta (or smaller variants of GPT-2 or GPT-J, for generative tasks) rather than on these medium-sized models might be the more prudent approach resource-wise.

Another approach to fine tuning the smaller models on domain data could be to use more carefully and rigorously crafted prompts with the medium-sized models. This could be a viable alternative to using the APIs provided by the owners of the very large proprietary models.

Finetuning a LM vs prompt-engineering an LLM

Answers (2)

Related Questions