Reputation: 83197

How can I define the title for each document I import into Azure OpenAI?

I imported some text files into Azure OpenAI:

After the import, I see a "title" field used for search:

which I can't edit via UI as it's greyed out:

How can I define the title for each document? For example, does the Azure OpenAI On Your Data API allow me to define the title for each document?

By default, titles are prepopulated via automated summarization (which seems to be simply truncation?). I can see some titles e.g. via:

import os
import pprint

from openai import AzureOpenAI
#from azure.identity import DefaultAzureCredential, get_bearer_token_provider

endpoint = os.getenv("ENDPOINT_URL", "https://[redacted].openai.azure.com/")
deployment = os.getenv("DEPLOYMENT_NAME", "[redacted GPT engine name]")
search_endpoint = os.getenv("SEARCH_ENDPOINT", "https://[redacted].search.windows.net")
search_key = os.getenv("SEARCH_KEY", "[redacted key]")
search_index = os.getenv("SEARCH_INDEX_NAME", "[redacted]")

# token_provider = get_bearer_token_provider(
#     DefaultAzureCredential(),
#     "https://cognitiveservices.azure.com/.default")

client = AzureOpenAI(
    azure_endpoint=endpoint,
    api_version="2024-05-01-preview",
    api_key='[redacted key]'
)
# azure_ad_token_provider=token_provider,

completion = client.chat.completions.create(
    model=deployment,
    messages=[
        {
            "role": "user",
            "content": "How can I sort a Python list?"
        }],
    max_tokens=800,
    temperature=0,
    top_p=1,
    frequency_penalty=0,
    presence_penalty=0,
    stop=None,
    stream=False,
    extra_body={
        "data_sources": [{
            "type": "azure_search",
            "parameters": {
                "endpoint": f"{search_endpoint}",
                "index_name": "[redacted]",
                "semantic_configuration": "default",
                "query_type": "vector_semantic_hybrid",
                "fields_mapping": {},
                "in_scope": True,
                "role_information": "You are an AI assistant that helps people find information.",
                "filter": None,
                "strictness": 5,
                "top_n_documents": 10,
                "authentication": {
                    "type": "api_key",
                    "key": f"{search_key}"
                },
                "embedding_dependency": {
                    "type": "deployment_name",
                    "deployment_name": "[redacted]"
                }
            }
        }]
    }
)
print(completion.to_json())

outputs:

{
  "id": "7eb67d03-3868-46fe-8cb1-fdf821c633be",
  "choices": [
    {
      "finish_reason": "stop",
      "index": 0,
      "message": {
        "content": "To acquire [...].",
        "role": "assistant",
        "end_turn": true,
        "context": {
          "citations": [
            {
              "content": "You can copy to your computer  [...]",
              "title": "You can copy [...]",
              "url": "https://[redacted].blob.core.windows.net/fileupload-he/920.txt",
              "filepath": "000920.txt",
              "chunk_id": "0"
            },
            {
              "content": "Do\r\none of the following:\r\nChoose File &gt; Automate [...]",
              "title": "Do",
              "url": "https://storingspace.blob.core.windows.net/fileupload-b/002715.txt",
              "filepath": "002715.txt",
              "chunk_id": "0"
            },
            [...]
          ],
          "intent": "[\"How to import x", \"Importing x\", \"Steps to import x"]"
        }
      }
    }
  ],
  "created": 1720747501,
  "model": "gpt-4o",
  "object": "extensions.chat.completion",
  "system_fingerprint": "fp_abc28019ad",
  "usage": {
    "completion_tokens": 230,
    "prompt_tokens": 5480,
    "total_tokens": 5710
  }
}

Upvotes: 1

Answers (2)

user3503711

Reputation: 2066

Using Python:

from azure.core.credentials import AzureKeyCredential
from azure.search.documents import SearchClient

search_client = SearchClient(endpoint=service_endpoint,
                             index_name=index_name, 
                             credential=AzureKeyCredential(api_key))
documents = []
document = {"id": your_document_id,
            "title": "title",
           }
documents.append(document)
search_client.merge_or_upload_documents(documents=documents)

The above merge/update the title field of current document based on the id. For multiple documents, append the documents list and update all documents at once!

Upvotes: 1

JayashankarGS

Reputation: 8055

Azure OpenAI On Your Data API doesn't have such kind of modifications to ai search only it gives you the results for the search query with citations based on the ai search data.

To modify the fields or fields value you need to go with azure ai search api/sdk.

Following document uses rest api to Add, Update or Delete Documents to ai search.

So in your case the request is like below.

{  
  "value": [  
    {  
      "@search.action": "merge",  
      "key_field_name": "unique_key_of_document", (key/value pair for key field from index schema)  
      "title": "your_custom_title" 
    },  
    ...  
  ]  
}

So, create a request body for all the unique keys with your title and update the documents via rest api.

Upvotes: 1

How can I define the title for each document I import into Azure OpenAI?

Answers (2)

Related Questions