ssvt
ssvt

Reputation: 31

How to Add New Analyzers to Existing Fields in Azure AI Search Without Creating a New Index?

I've been working with Azure AI Search and encountered an issue while trying to change the structure of an existing index. Initially, I thought it was not possible to update an index until I tried adding to the JSON file and received this error:

"An index with the name 'myindex' in service 'myservice' could not be updated. Index update not allowed because it would cause downtime. In order to add new analyzers, normalizers, tokenizers, token filters, or character filters to an existing index, or modify its similarity settings, set the 'allowIndexDowntime' query parameter to 'true' in the index update request. Note that this operation will put your index offline for at least a few seconds, causing your indexing and query requests to fail. Performance and write availability of the index can be impaired for several minutes after the index is updated, or longer for very large indexes"

Following the guidance, I updated the index using the allowIndexDowntime=true parameter in my PUT request, which should allow changes with some temporary downtime.

However, when I attempt to add a new analyzer to one of my fields, I receive a 400 response with this error message:

{
    "error": {
        "code": "OperationNotAllowed",
        "message": "Existing field 'myFieldName' cannot be changed.",
        "details": [
            {
                "code": "CannotChangeExistingField",
                "message": "Existing field 'myFieldName' cannot be changed."
            }
        ]
    }
}

Is there a way to add new analyzers to existing fields without having to recreate the entire index each time? If anyone has experience with this or knows a workaround, your advice would be greatly appreciated!

Upvotes: 0

Views: 331

Answers (1)

Suresh Chikkam
Suresh Chikkam

Reputation: 3473

How to Add New Analyzers to Existing Fields in Azure AI Search Without Creating a New Index?

We cannot directly add new analyzers to existing fields in Azure Cognitive Search without recreating the index. Azure Cognitive Search imposes strict limitations on modifying existing fields after an index is created.

  • The only way to change field definitions is by recreating the index.

Exporting the current index schema. modify the schema JSON file to include the new analyzer and apply it to the field that requires it.

current index schema:

{
  "name": "myindex",
  "fields": [
    {
      "name": "myFieldName",
      "type": "Edm.String",
      "searchable": true,
      "filterable": false,
      "analyzer": "standard"
    },
    ...
  ],
  ...
}

Now, add the new analyzer configuration you need for that you need to define the new analyzer in the "analyzers" section.

{
  "fields": [
    {
      "name": "myFieldName_analyzed",
      "type": "Edm.String",
      "searchable": true,
      "filterable": false,
      "analyzer": "custom_analyzer"
    }
  ],
  "analyzers": [
    {
      "name": "custom_analyzer",
      "type": "custom",
      "tokenizer": "standard",
      "tokenFilters": ["lowercase", "asciifolding"]
    }
  ]
}

POST Request to Reindex Data for the New Field here is the response:

{
  "statusCode": 200,
  "message": "Documents successfully reindexed for new field '***********'."
}

enter image description here

Updated:

Index API:

POST https://<search-service-name>.search.windows.net/indexes/<index-name>/docs/index?api-version=2021-04-30-Preview
Content-Type: application/json
api-key: <api-key>

{
  "value": [
    {
      "@search.action": "upload",
      "id": "123",
      "field1": "data",
      "field2": "data"
    }
  ]
}

To list and filter blobs by the Last-Modified date using py SDK:

from azure.storage.blob import BlobServiceClient
from datetime import datetime

# Initialize BlobServiceClient
blob_service_client = BlobServiceClient.from_connection_string("<your_connection_string>")
container_client = blob_service_client.get_container_client("<container_name>")

# Retrieve the last synced timestamp (from your storage or database)
last_synced = datetime.strptime("2023-10-21T12:00:00Z", "%Y-%m-%dT%H:%M:%SZ")

# List blobs in the container
blobs_list = container_client.list_blobs()

# Process blobs modified after the last sync
for blob in blobs_list:
    last_modified = blob['properties']['last_modified']
    if last_modified > last_synced:
        # Process the blob (e.g., push to Azure AI Search)
        print(f"Processing {blob['name']}")

# After processing, update the last synced timestamp
new_last_synced = datetime.utcnow()

Upvotes: 1

Related Questions