Shravan K Subrahmanya
Shravan K Subrahmanya

Reputation: 21

Error Deleting Vector Entries in Azure Cognitive Search: Missing Required Positional Argument 'batch'

I'm working on an Azure Function that triggers when a document is marked as deleted in an Azure Cosmos DB container. When this happens, I want to delete the associated messages in Cosmos DB and also remove corresponding vector entries from Azure Cognitive Search.

Here's the relevant part of my code for the Azure Cognitive Search deletion:

import azure.functions as func
from azure.cosmos import CosmosClient, exceptions
from azure.core.credentials import AzureKeyCredential
from azure.search.documents import SearchClient
import os
import logging

chat_deletion_trigger = func.Blueprint()

@chat_deletion_trigger.cosmos_db_trigger(
    connection="CosmosDBConnectionString",
    database_name="ChatDatabase",
    container_name="Chats",
    lease_container_name="leases",
    create_lease_container_if_not_exists=True,
    arg_name="documents"
)
def ChatDeletionTrigger(documents: func.DocumentList) -> None:
    logging.info("ChatDeletionTrigger function started processing.")

    cosmos_client = CosmosClient.from_connection_string(os.environ['CosmosDBConnectionString'])
    database_name = 'ChatDatabase'
    database = cosmos_client.get_database_client(database_name)
    messages_container = database.get_container_client('Messages')

    search_client = SearchClient(
        endpoint=os.getenv('SearchServiceEndpoint'),
        index_name="azureblob-index",
        credential=AzureKeyCredential(os.getenv('SearchServiceKey'))
    )

    for document in documents:
        chat_id = document.get('id')
        is_deleted = document.get('isDeleted', False)

        if not chat_id:
            logging.error("Document does not have an 'id' field. Skipping document.")
            continue

        if not is_deleted:
            logging.info(f"Chat with id: {chat_id} is not marked as deleted. Skipping document.")
            continue

        logging.info(f"Processing deletion of messages and vector entries associated with chat id: {chat_id}")

        try:
            messages_container.scripts.execute_stored_procedure(
                sproc="deleteMessagesByChatId",
                params=[chat_id],
                partition_key=chat_id
            )
            logging.info(f"Deleted messages associated with chatId {chat_id} using stored procedure.")

        except exceptions.CosmosHttpResponseError as e:
            logging.error(f"Error deleting messages for chat id {chat_id}: {str(e)}")

        try:
            results = search_client.search(search_text="", filter=f"chatId eq '{chat_id}'")

            document_ids = [doc['documentId'] for doc in results]

            if document_ids:
                delete_actions = [{"@search.action": "delete", "documentId": doc_id} for doc_id in document_ids]
                batch = {"value": delete_actions}
                search_client.index_documents(batch=batch)
                logging.info(f"Deleted vector entries for chatId: {chat_id}")

            else:
                logging.info(f"No vector entries found for chatId: {chat_id}")

        except Exception as e:
            logging.error(f"Error deleting vector entries for chat id {chat_id}: {str(e)}")

    logging.info("ChatDeletionTrigger function completed processing.")

Issue: When I attempt to delete the vector entries associated with a chatId, I receive the following error: Error deleting vector entries for chat id {chat_id}: SearchClient.index_documents() missing 1 required positional argument: 'batch'

I checked the Azure Cognitive Search documentation but didn't find a clear explanation for this issue.

Question: How can I correctly pass the batch of delete actions to the index_documents method in Azure Cognitive Search? Am I structuring the batch parameter incorrectly, or is there something else I'm missing? Any help would be appreciated!

Upvotes: 0

Views: 129

Answers (2)

Shravan K Subrahmanya
Shravan K Subrahmanya

Reputation: 21

To resolve the issue you're encountering when attempting to delete vector entries in Azure Cognitive Search, you need to correctly structure the batch of delete actions before passing it to the index_documents method.

Problem Explanation:

The error occurs because the index_documents method expects an IndexDocumentsBatch object or a list of documents to index. In your original code, you're passing a dictionary (batch = {"value": delete_actions}), which is incorrect.

Solution:

You should use the IndexDocumentsBatch class to create a batch of delete actions and then pass it to the index_documents method. Here's how you can modify your code to fix the issue:

import azure.functions as func
from azure.cosmos import CosmosClient, exceptions
from azure.core.credentials import AzureKeyCredential
from azure.search.documents import SearchClient, IndexDocumentsBatch
import os
import logging

chat_deletion_trigger = func.Blueprint()

@chat_deletion_trigger.cosmos_db_trigger(
    connection="CosmosDBConnectionString", 
    database_name="ChatDatabase",
    container_name="Chats", 
    lease_container_name="leases",  
    create_lease_container_if_not_exists=True, 
    arg_name="documents" 
)
def ChatDeletionTrigger(documents: func.DocumentList) -> None:
    logging.info("ChatDeletionTrigger function started processing.")

    # Initialize Cosmos DB client
    cosmos_client = CosmosClient.from_connection_string(os.environ['CosmosDBConnectionString'])
    database_name = 'ChatDatabase'
    database = cosmos_client.get_database_client(database_name)
    messages_container = database.get_container_client('Messages')

    # Initialize Azure Cognitive Search client
    search_client = SearchClient(
        endpoint=os.getenv('SearchServiceEndpoint'),
        index_name="azureblob-index",  # Replace with your actual index name
        credential=AzureKeyCredential(os.getenv('SearchServiceKey'))
    )

    for document in documents:
        chat_id = document.get('id')
        is_deleted = document.get('isDeleted', False)

        if not chat_id:
            logging.error("Document does not have an 'id' field. Skipping document.")
            continue

        # Check if the chat is marked as deleted
        if not is_deleted:
            logging.info(f"Chat with id: {chat_id} is not marked as deleted. Skipping document.")
            continue

        logging.info(f"Processing deletion of messages and vector entries associated with chat id: {chat_id}")

        try:
            # Call the stored procedure to delete all messages by chatId
            messages_container.scripts.execute_stored_procedure(
                sproc="deleteMessagesByChatId",
                params=[chat_id],
                partition_key=chat_id  # Assuming partition key is chatId
            )
            logging.info(f"Deleted messages associated with chatId {chat_id} using stored procedure.")

        except exceptions.CosmosHttpResponseError as e:
            logging.error(f"Error deleting messages for chat id {chat_id}: {str(e)}")

        try:
            # Fetch the documents to identify the correct document IDs
            results = search_client.search(search_text="", filter=f"chatId eq '{chat_id}'")

            # Create a list of document IDs to delete
            document_ids = [{"metadata_storage_path": doc['metadata_storage_path']} for doc in results]

            if document_ids:
                # Create a batch and add delete actions
                batch = IndexDocumentsBatch()
                batch.add_delete_actions(*document_ids)  # Use correct key field

                # Execute the batch delete operation
                search_client.index_documents(batch=batch)
                logging.info(f"Deleted vector entries for chatId: {chat_id}")

            else:
                logging.info(f"No vector entries found for chatId: {chat_id}")

        except Exception as e:
            logging.error(f"Error deleting vector entries for chat id {chat_id}: {str(e)}")

    logging.info("ChatDeletionTrigger function completed processing.")

Key Changes:

  • Use of IndexDocumentsBatch: The code now creates an IndexDocumentsBatch object and adds delete actions using the add_delete_actions method.
  • Correct Batch Structure: The delete actions are added to the batch using the correct document key (metadata_storage_path in this case).
  • Passing the Batch to index_documents: The index_documents method now correctly receives the batch of delete actions.

Upvotes: 0

Ikhtesam Afrin
Ikhtesam Afrin

Reputation: 6487

Modify your code as given below to get the expected response and use delete_documents method.

import azure.functions as func
import logging
from azure.cosmos import CosmosClient
from azure.core.credentials import AzureKeyCredential
from azure.search.documents import SearchClient
import os

app = func.FunctionApp()

@app.cosmos_db_trigger(arg_name="documents", container_name="Chats",
                        database_name="ChatDatabase", connection="CosmosDBConnectionString",
                        lease_container_name="leases",create_lease_container_if_not_exists=True)  
def ChatDeletionTrigger(documents: func.DocumentList):
    logging.info('Python CosmosDB triggered.')

    cosmos_client = CosmosClient.from_connection_string(os.environ['CosmosDBConnectionString'])
    database_name = 'ChatDatabase'
    database = cosmos_client.get_database_client(database_name)
    messages_container = database.get_container_client('Chats')

    search_client = SearchClient(
        endpoint="https://******.search.windows.net",
        index_name="azureblob-index",
        credential=AzureKeyCredential("SearchServiceKey")
    )

    for document in documents:
        chat_id = document.get('id')

        if not chat_id:
            logging.error("Document does not have an 'id' field. Skipping document.")
            continue

        logging.info(f"Processing deletion of messages and vector entries associated with chat id: {chat_id}")

        try:
            results = search_client.search(search_text="", filter=f"chatId eq '{chat_id}'")

            document_ids = [doc['documentId'] for doc in results]

            if document_ids:
                delete_actions = [{"@search.action": "delete", "documentId": doc_id} for doc_id in document_ids]
                search_client.delete_documents(documents=delete_actions)
                logging.info(f"Deleted vector entries for chatId: {chat_id}")

            else:
                logging.info(f"No vector entries found for chatId: {chat_id}")

        except Exception as e:
            logging.error(f"Error deleting vector entries for chat id {chat_id}: {str(e)}")

    logging.info("ChatDeletionTrigger function completed processing.")

I am able to delete a document by using the document Id.

Azure Functions Core Tools
Core Tools Version:       4.0.5907 Commit hash: N/A +807e89766a92b14fd07b9f0bc2bea1d8777ab209 (64-bit)
Function Runtime Version: 4.834.3.22875

[2024-08-28T10:34:44.071Z] 0.02s - Debugger warning: It seems that frozen modules are being used, which may
[2024-08-28T10:34:44.073Z] 0.00s - make the debugger miss breakpoints. Please pass -Xfrozen_modules=off
[2024-08-28T10:34:44.073Z] 0.00s - to python to disable frozen modules.
[2024-08-28T10:34:44.074Z] 0.00s - Note: Debugging will proceed. Set PYDEVD_DISABLE_FILE_VALIDATION=1 to disable this validation.
[2024-08-28T10:34:44.165Z] Worker process started and initialized.

Functions:

        ChatDeletionTrigger: cosmosDBTrigger

For detailed output, run func with --verbose flag.
[2024-08-28T10:34:49.093Z] Host lock lease acquired by instance ID '0000000000000000000000000D2022A4'.
[2024-08-28T10:35:06.852Z] Executing 'Functions.ChatDeletionTrigger' (Reason='New changes on container Chats at 2024-08-28T10:35:06.8097381Z', Id=5d5c5b7b-bc06-4383-bb77-67bfaab2dc32)
[2024-08-28T10:35:06.963Z] Python CosmosDB triggered.
[2024-08-28T10:35:07.748Z] Request URL: 'https://******.documents.azure.com:443/'
Request method: 'GET'
Request headers:
    'Cache-Control': 'no-cache'
    'x-ms-version': 'REDACTED'
    'x-ms-documentdb-query-iscontinuationexpected': 'REDACTED'
    'x-ms-date': 'REDACTED'
    'authorization': 'REDACTED'
    'Accept': 'application/json'
    'Content-Length': '0'
    'User-Agent': 'azsdk-python-cosmos/4.7.0 Python/3.11.9 (Windows-10-10.0.22631-SP0)'
No body was attached to the request
[2024-08-28T10:35:08.655Z] Response status: 200
Response headers:
    'Cache-Control': 'no-store, no-cache'
    'Pragma': 'no-cache'
    'Transfer-Encoding': 'chunked'
    'Content-Type': 'application/json'
    'Content-Location': 'REDACTED'
    'Server': 'Microsoft-HTTPAPI/2.0'
    'x-ms-max-media-storage-usage-mb': 'REDACTED'
    'x-ms-media-storage-usage-mb': 'REDACTED'
    'x-ms-databaseaccount-consumed-mb': 'REDACTED'
    'x-ms-databaseaccount-reserved-mb': 'REDACTED'
    'x-ms-databaseaccount-provisioned-mb': 'REDACTED'
    'Strict-Transport-Security': 'REDACTED'
    'x-ms-gatewayversion': 'REDACTED'
    'Date': 'Wed, 28 Aug 2024 10:35:06 GMT'
[2024-08-28T10:35:08.668Z] Processing deletion of messages and vector entries associated with chat id: 123
[2024-08-28T10:35:08.672Z] Request URL: 'https://******.search.windows.net/indexes('azureblob-index')/docs/search.post.search?api-version=REDACTED'
Request method: 'POST'
Request headers:
    'Content-Type': 'application/json'
    'Content-Length': '43'
    'api-key': 'REDACTED'
    'Accept': 'application/json;odata.metadata=none'
    'x-ms-client-request-id': '32787a9a-6529-11ef-898d-7c214ae5d066'
    'User-Agent': 'azsdk-python-search-documents/11.5.1 Python/3.11.9 (Windows-10-10.0.22631-SP0)'
A body is sent with the request
[2024-08-28T10:35:09.711Z] Response status: 200
Response headers:
    'Transfer-Encoding': 'chunked'
    'Content-Type': 'application/json; odata.metadata=none; odata.streaming=true; charset=utf-8'
    'Content-Encoding': 'REDACTED'
    'Vary': 'REDACTED'
    'Server': 'Microsoft-IIS/10.0'
    'Strict-Transport-Security': 'REDACTED'
    'Preference-Applied': 'REDACTED'
    'OData-Version': 'REDACTED'
    'request-id': '32787a9a-6529-11ef-898d-7c214ae5d066'
    'elapsed-time': 'REDACTED'
    'Date': 'Wed, 28 Aug 2024 10:35:08 GMT'
[2024-08-28T10:35:09.724Z] Request URL: 'https://******.search.windows.net/indexes('azureblob-index')/docs/search.index?api-version=REDACTED'
Request method: 'POST'
Request headers:
    'Content-Type': 'application/json'
    'Content-Length': '63'
    'api-key': 'REDACTED'
    'Accept': 'application/json;odata.metadata=none'
    'x-ms-client-request-id': '3319fce2-6529-11ef-89cd-7c214ae5d066'
    'User-Agent': 'azsdk-python-search-documents/11.5.1 Python/3.11.9 (Windows-10-10.0.22631-SP0)'
A body is sent with the request
[2024-08-28T10:35:09.980Z] Response status: 200
Response headers:
    'Transfer-Encoding': 'chunked'
    'Content-Type': 'application/json; odata.metadata=none; odata.streaming=true; charset=utf-8'
    'Content-Encoding': 'REDACTED'
    'Vary': 'REDACTED'
    'Server': 'Microsoft-IIS/10.0'
    'Strict-Transport-Security': 'REDACTED'
    'Preference-Applied': 'REDACTED'
    'OData-Version': 'REDACTED'
    'request-id': '3319fce2-6529-11ef-89cd-7c214ae5d066'
    'elapsed-time': 'REDACTED'
    'Date': 'Wed, 28 Aug 2024 10:35:08 GMT'
[2024-08-28T10:35:09.984Z] ChatDeletionTrigger function completed processing.
[2024-08-28T10:35:09.983Z] Deleted vector entries for chatId: 123
[2024-08-28T10:35:10.048Z] Executed 'Functions.ChatDeletionTrigger' (Succeeded, Id=5d5c5b7b-bc06-4383-bb77-67bfaab2dc32, Duration=3214ms)

Upvotes: 0

Related Questions