Reputation: 69
This is my code for calling the gpt-4 model:
messages = [
{"role": "system", "content": system_msg},
{"role": "user", "content": req}
]
response = openai.ChatCompletion.create(
engine = "******-gpt-4-32k",
messages = messages,
temperature=0,
top_p=1,
frequency_penalty=0,
presence_penalty=0
)
answer = response["choices"][0]["message"]["content"]
Keeping system_msg & req constant, with temperature=0, I get different answers. I got 3 different answers when I last ran this 10 times for instance. The answers are similar in concept, but differ in semantics.
Why is this happening?
Upvotes: 3
Views: 8624
Reputation: 83387
This blogpost authored by Sherman Chann argues that:
Non-determinism in GPT-4 is caused by Sparse MoE [mixture of experts].
Note that it’s now possible to set a seed parameter. From platform.openai.com/docs (mirror):
Reproducible outputs
Beta
Chat Completions are non-deterministic by default (which means model outputs may differ from request to request). That being said, we offer some control towards deterministic outputs by giving you access to the seed parameter and the system_fingerprint response field.
To receive (mostly) deterministic outputs across API calls, you can:
Set the seed parameter to any integer of your choice and use the same value across requests you'd like deterministic outputs for.
Ensure all other parameters (like
prompt
ortemperature
) are the exact same across requests.Sometimes, determinism may be impacted due to necessary changes OpenAI makes to model configurations on our end. To help you keep track of these changes, we expose the system_fingerprint field. If this value is different, you may see different outputs due to changes we've made on our systems.
Upvotes: 5
Reputation: 1688
My understanding on GPT4 unstability:
One possible explanation is they have multiple endpoints to face the volume of incoming requests, these do not have the exact same setup/initialisation and are not consistent against one another. But if requests are by chance on the same endpoint, it is consistent (same process, same time window, likely same http connection).
On top of that, it is also very likely that openAI does some parallel A/B tests with different service setup and different models.
I'm not saying that GPT4 is fully deterministic (with proper parameters) but many other explanations are possible.
Upvotes: 0
Reputation: 69
Found a solution here: https://community.openai.com/t/observing-discrepancy-in-completions-with-temperature-0/73380
TLDR; some discrepancies occur due to floating pt operations sometimes when 2 tokens have very close probabilities. Even a single token change affects the whole chain and leads to divergent generations.
Upvotes: 1