Why is the consumption of openai tokens in azure hybrid search 100x higher in comparison to 'regular' prompts?

Question

We are using openai via microsoft azure. Doing 'regular' prompts / using the api leads to the following output:

{
  choices: [
    {
      content_filter_results: [Object],
      finish_reason: 'stop',
      index: 0,
      logprobs: null,
      message: [Object]
    }
  ],
  created: 1721653544,
  id: 'chatcmpl-9nn1UksGuyliAGQsnqUrh3ADv5UhR',
  model: 'gpt-4o-2024-05-13',
  object: 'chat.completion',
  prompt_filter_results: [ { prompt_index: 0, content_filter_results: [Object] } ],
  system_fingerprint: 'fp_abc28019ad',
  usage: { completion_tokens: 77, prompt_tokens: 36, total_tokens: 113 }
}

Doing prompts querying the search endpoint leads to:

{
  id: 'afdcc11f-66e6-412b-8c03-5ee25a20d249',
  model: 'gpt-4o',
  created: 1721654963,
  object: 'extensions.chat.completion',
  choices: [ { index: 0, finish_reason: 'stop', message: [Object] } ],
  usage: { prompt_tokens: 6193, completion_tokens: 32, total_tokens: 6225 },
  system_fingerprint: 'fp_abc28019ad'
}

Please notice the extraordinary higher amount of tokens.

Here is our setup for the search api call

const SEARCH_BODY_TEMPLATE = {
    data_sources: [
        {
            type: "azure_search",
            parameters: {
                filter: null,
                endpoint: process.env.SEARCH_SERVICE_ENDPOINT,
                index_name: process.env.SEARCH_INDEX_NAME,
                project_resource_id: process.env.SEARCH_PROJECT_RESOURCE_ID,
                semantic_configuration: "azureml-default",
                authentication: {
                    "type": "system_assigned_managed_identity",
                    "key": null
                },
                role_information: "Your name is POC. Your are an intelligent assistant that has been developed to help all employees. Keep your answers short and clear.",
                in_scope: true,
                strictness: 1,
                top_n_documents: 3,
                key: process.env.SEARCH_KEY,
                embedding_endpoint: "https://xxx.openai.azure.com/openai/deployments/text-embedding-ada-002/embeddings?api-version=2023-05-15",
                embedding_key: process.env.OPEN_AI_API_KEY,
                query_type: "vectorSimpleHybrid"
            }
        }
    ],
    messages: [{
        role: "system",
        content: "You are a basic assitant. Answer only if you really know. Otherwise answer 'i don't know'."
    },
    {
        role: "user",
        content: "What is math?"
    }],
    deployment: process.env.SEARCH_DEPLOYMENT_ID,
    temperature: 0.7,
    top_p: 0.95,
    max_tokens: 200,
    stop: null,
    frequency_penalty: 0,
    presence_penalty: 0,
}

When attaching data to the index, we also did the following to try to increase performance or lower the amount of tokes consumed (with no effect):

Drag & Drop pdf files inside the azure web ui
Upload many and few documents
Do chunking in code and upload to the index programmatically

So we end up with some questions:

What can we do to increase the performance? (We would like hundreds of users to use it...)
Is this a normal amount of tokens when using azure search?
Is there a better way or more performing way of chatting with my own data using azure?

Why is the consumption of openai tokens in azure hybrid search 100x higher in comparison to 'regular' prompts?

Answers (1)

Related Questions

Why is the consumption of openai tokens in azure hybrid search 100x higher in comparison to &#39;regular&#39; prompts?

Answers (1)

Related Questions

Why is the consumption of openai tokens in azure hybrid search 100x higher in comparison to 'regular' prompts?