Reputation: 35
As we are getting in to turbulent times of AI. I am as well spilling mine drop in to ocean. As I am pythonian, all attempts are done in python/anaconda.
Does anybody have already some experience in "data formats" passable to GPT family of AIs?
In documentation is recommended use of OpenAI tool for control. Followed by documentation recommending format ("Prompt:", "Completion:") With strings marked as:
["str" = in quotes,"/" = separator ,"@>" = unique symbol,
" " = comp. starts with empty space]
'Prompt': 'Hello AI..!!/@>'
'Completion': ' How are you today?/@>'
"Completion" should have empty space at start of every sting. So far I was able to find just simple examples as:
Col1 Col2
'Prompt': 'Completion':
'Text/@>' ' Text/@>'
Is there any way it will understand more complex dataset? Is effective to have more dim. DataFrame? Example:
Col1 Col2 Col3 Col4
'Prompt_a': 'Completion_a': 'Prompt_b': 'Completion_b':
'Text/@>' ' Text/@>' 'Text/@>' ' Text/@>
Is longer context text passed just as 'str/@>', or is some partition needed?
' text text text /@>'
Many thanks for all answers and efforts in advance.
Already checked: https://help.openai.com/en/articles/6811186-how-do-i-format-my-fine-tuning-data
Upvotes: 1
Views: 1700
Reputation: 22920
As stated in the official OpenAI documentation:
Your data must be a JSONL document, where each line is a prompt-completion pair corresponding to a training example. You can use our CLI data preparation tool to easily convert your data into this file format.
This tool accepts different formats, with the only requirement that they contain a prompt and a completion column/key. You can pass a CSV, TSV, XLSX, JSON or JSONL file, and it will save the output into a JSONL file ready for fine-tuning, after guiding you through the process of suggested changes.
Upvotes: 1