Reputation: 83197
I imported some text files into Azure OpenAI:
After the import, I see a "title" field used for search:
which I can't edit via UI as it's greyed out:
How can I define the title for each document? For example, does the Azure OpenAI On Your Data API allow me to define the title for each document?
By default, titles are prepopulated via automated summarization (which seems to be simply truncation?). I can see some titles e.g. via:
import os
import pprint
from openai import AzureOpenAI
#from azure.identity import DefaultAzureCredential, get_bearer_token_provider
endpoint = os.getenv("ENDPOINT_URL", "https://[redacted].openai.azure.com/")
deployment = os.getenv("DEPLOYMENT_NAME", "[redacted GPT engine name]")
search_endpoint = os.getenv("SEARCH_ENDPOINT", "https://[redacted].search.windows.net")
search_key = os.getenv("SEARCH_KEY", "[redacted key]")
search_index = os.getenv("SEARCH_INDEX_NAME", "[redacted]")
# token_provider = get_bearer_token_provider(
# DefaultAzureCredential(),
# "https://cognitiveservices.azure.com/.default")
client = AzureOpenAI(
azure_endpoint=endpoint,
api_version="2024-05-01-preview",
api_key='[redacted key]'
)
# azure_ad_token_provider=token_provider,
completion = client.chat.completions.create(
model=deployment,
messages=[
{
"role": "user",
"content": "How can I sort a Python list?"
}],
max_tokens=800,
temperature=0,
top_p=1,
frequency_penalty=0,
presence_penalty=0,
stop=None,
stream=False,
extra_body={
"data_sources": [{
"type": "azure_search",
"parameters": {
"endpoint": f"{search_endpoint}",
"index_name": "[redacted]",
"semantic_configuration": "default",
"query_type": "vector_semantic_hybrid",
"fields_mapping": {},
"in_scope": True,
"role_information": "You are an AI assistant that helps people find information.",
"filter": None,
"strictness": 5,
"top_n_documents": 10,
"authentication": {
"type": "api_key",
"key": f"{search_key}"
},
"embedding_dependency": {
"type": "deployment_name",
"deployment_name": "[redacted]"
}
}
}]
}
)
print(completion.to_json())
outputs:
{
"id": "7eb67d03-3868-46fe-8cb1-fdf821c633be",
"choices": [
{
"finish_reason": "stop",
"index": 0,
"message": {
"content": "To acquire [...].",
"role": "assistant",
"end_turn": true,
"context": {
"citations": [
{
"content": "You can copy to your computer [...]",
"title": "You can copy [...]",
"url": "https://[redacted].blob.core.windows.net/fileupload-he/920.txt",
"filepath": "000920.txt",
"chunk_id": "0"
},
{
"content": "Do\r\none of the following:\r\nChoose File > Automate [...]",
"title": "Do",
"url": "https://storingspace.blob.core.windows.net/fileupload-b/002715.txt",
"filepath": "002715.txt",
"chunk_id": "0"
},
[...]
],
"intent": "[\"How to import x", \"Importing x\", \"Steps to import x"]"
}
}
}
],
"created": 1720747501,
"model": "gpt-4o",
"object": "extensions.chat.completion",
"system_fingerprint": "fp_abc28019ad",
"usage": {
"completion_tokens": 230,
"prompt_tokens": 5480,
"total_tokens": 5710
}
}
Upvotes: 1
Views: 219
Reputation: 2066
Using Python:
from azure.core.credentials import AzureKeyCredential
from azure.search.documents import SearchClient
search_client = SearchClient(endpoint=service_endpoint,
index_name=index_name,
credential=AzureKeyCredential(api_key))
documents = []
document = {"id": your_document_id,
"title": "title",
}
documents.append(document)
search_client.merge_or_upload_documents(documents=documents)
The above merge/update the title
field of current document based on the id
. For multiple documents, append
the documents
list and update all documents at once!
Upvotes: 1
Reputation: 8055
Azure OpenAI On Your Data API doesn't have such kind of modifications to ai search only it gives you the results for the search query with citations based on the ai search data.
To modify the fields or fields value you need to go with azure ai search api/sdk.
Following document uses rest api to Add, Update or Delete Documents to ai search.
So in your case the request is like below.
{
"value": [
{
"@search.action": "merge",
"key_field_name": "unique_key_of_document", (key/value pair for key field from index schema)
"title": "your_custom_title"
},
...
]
}
So, create a request body for all the unique keys with your title and update the documents via rest api.
Upvotes: 1