Kavya Bhandari
Kavya Bhandari

Reputation: 69

Why is GPT-4 giving different answers with same prompt & temperature=0?

This is my code for calling the gpt-4 model:

messages = [
    {"role": "system", "content": system_msg},
    {"role": "user", "content": req}
]

response = openai.ChatCompletion.create(
        engine = "******-gpt-4-32k",
        messages = messages,
        temperature=0,
        top_p=1,
        frequency_penalty=0,
        presence_penalty=0
    )

answer = response["choices"][0]["message"]["content"]

Keeping system_msg & req constant, with temperature=0, I get different answers. I got 3 different answers when I last ran this 10 times for instance. The answers are similar in concept, but differ in semantics.

Why is this happening?

Upvotes: 3

Views: 8624

Answers (3)

Franck Dernoncourt
Franck Dernoncourt

Reputation: 83387

This blogpost authored by Sherman Chann argues that:

Non-determinism in GPT-4 is caused by Sparse MoE [mixture of experts].

Note that it’s now possible to set a seed parameter. From platform.openai.com/docs (mirror):

Reproducible outputs

Beta

Chat Completions are non-deterministic by default (which means model outputs may differ from request to request). That being said, we offer some control towards deterministic outputs by giving you access to the seed parameter and the system_fingerprint response field.

To receive (mostly) deterministic outputs across API calls, you can:

  • Set the seed parameter to any integer of your choice and use the same value across requests you'd like deterministic outputs for.

  • Ensure all other parameters (like prompt or temperature) are the exact same across requests.

Sometimes, determinism may be impacted due to necessary changes OpenAI makes to model configurations on our end. To help you keep track of these changes, we expose the system_fingerprint field. If this value is different, you may see different outputs due to changes we've made on our systems.

Upvotes: 5

gdupont
gdupont

Reputation: 1688

My understanding on GPT4 unstability:

  • given a unique process and sequential calls to the api with temperature at 0.0 and top_p at 0.0, the api is consistent and stable (ie same input prompt leads to same output)
  • same prompt within a new process can have different results

One possible explanation is they have multiple endpoints to face the volume of incoming requests, these do not have the exact same setup/initialisation and are not consistent against one another. But if requests are by chance on the same endpoint, it is consistent (same process, same time window, likely same http connection).

On top of that, it is also very likely that openAI does some parallel A/B tests with different service setup and different models.

I'm not saying that GPT4 is fully deterministic (with proper parameters) but many other explanations are possible.

Upvotes: 0

Kavya Bhandari
Kavya Bhandari

Reputation: 69

Found a solution here: https://community.openai.com/t/observing-discrepancy-in-completions-with-temperature-0/73380

TLDR; some discrepancies occur due to floating pt operations sometimes when 2 tokens have very close probabilities. Even a single token change affects the whole chain and leads to divergent generations.

Upvotes: 1

Related Questions