Reputation: 1
I'm facing a challenge with extracting JSON data from files using different AI models and could use some help.
Problem Description: When I provide file content to ChatGPT 3.5 Turbo, I receive a complete JSON output. However, when using the GPT-4 o-Mini module via the external OpenAI API, which has a higher input token limit of 8192, I’m not getting the full JSON data.
Details:
Models: GPT-3.5 Turbo vs. GPT-4 o-Mini Input Tokens Limit: 8192 (GPT-4 o-Mini) vs. a lesser limit for GPT-3.5 Turbo Issue: Despite the higher token limit of GPT-4 o-Mini, I'm receiving incomplete JSON data from it, whereas GPT-3.5 Turbo provides the full output. Steps Taken:
Provided file content to GPT-3.5 Turbo – received the full JSON output. Used the external OpenAI API with GPT-4 o-Mini – resulted in incomplete JSON data. Question: Has anyone experienced similar issues with token limits affecting the completeness of data extraction? What are some best practices for ensuring complete JSON extraction with models like GPT-4 o-Mini, especially when handling larger inputs?
Any advice or suggestions would be greatly appreciated!
Thanks in advance for your help!
Any advice or suggestions would be greatly appreciated!
Upvotes: 0
Views: 150
Reputation: 9239
It's unclear in the question whether your response is incomplete in the sense that :
... so I might be missing the point, but some generic ideas below:
max_tokens
parameter is not set too low for the responseYou mentioned accessing GPT-4o-mini
with an external API (I guess this differs from how you access ChatGPT-3.5-turbo
).
If you are using some library to perform the query, you might want to make sure that it doesn't set the max-tokens
parameter to a value that's too low for your use case.
Some libraries default to a maximum of 1000 output tokens or similar unless you explicitly set it to a higher number.
If the data seems to slightly exceed the limits for input/output, you can sometimes mitigate it by removing the unnecessary JSON indentation and, therefore, reducing the number of tokens (sometimes by 50% or more).
Upvotes: 0