Azure OpenAI citations with message correlation

Question

I am trying out Azure OpenAI with my own data. The data is uploaded to Azure Blob Storage and indexed for use with Azure AI search

I do a call to the endpoint in the form of POST {endpoint}/openai/deployments/{deployment-id}/chat/completions?api-version={api-version}, as referenced here.

However, in the response I cannot figure out how the choices[0]['message']['context']['citations'] field correspond to the choices[0]['message']['content'].

For example, I can have a content as something like:

I have a pear [doc1][doc2]. I have an apple [doc1][doc3].

However, in my citations it looks like:

citations[0].filepath == 'file1.pdf'
citations[1].filepath == 'file2.pdf'
citations[2].filepath == 'file1.pdf'
citations[3].filepath == 'file3.pdf'
citations[4].filepath == 'file4.pdf'

In summary, my question is whether if there is some sort of mapping from doc as shown in the message to the citations.filepath.

JayashankarGS · Accepted Answer

Actually, it is not about the length of the citations; it is about how many times the file is referred.

If you observe clearly, you can see 'file1.pdf' is referred twice, so mappings will be based on the first appearance and reuse of docs like below:

doc1 -> citations[0] (file1.pdf).
doc2 -> citations[1] (file2.pdf).
Reuse of doc1 -> Refers back to the first document (citations[2], file1.pdf).
doc3 -> citations[3] (file3.pdf).

Use the code below to get mappings and use it in the content.

import re

def map_citations(content, citations):
    
    pattern = re.compile(r'$$doc(\d+)$$')
    segments = pattern.split(content)
    
    doc_numbers = []
    for segment in segments:
        if segment.isdigit():
            doc_numbers.append(int(segment))
    
    

    doc_to_file_map = {}
    for i, doc_num in enumerate(doc_numbers):
        doc_to_file_map[f'doc{doc_num}'] = citations[i]['filepath']

    print(doc_to_file_map)
    
    def replace_placeholder(match):
        doc_num = match.group(1)
        return f"[{doc_to_file_map[f'doc{doc_num}']}]"
    
    mapped_content = pattern.sub(replace_placeholder, content)
    
    return mapped_content

content = "I have a pear [doc1][doc2]. I have an apple [doc1][doc3]."
citations = [
    {'filepath': 'file1.pdf'},
    {'filepath': 'file2.pdf'},
    {'filepath': 'file1.pdf'},
    {'filepath': 'file3.pdf'},
    {'filepath': 'file4.pdf'}
]

mapped_content = map_citations(content, citations)
print(mapped_content)

Output:

enter image description here

Azure OpenAI citations with message correlation

Answers (1)

Related Questions