Suraj Dalvi
Suraj Dalvi

Reputation: 1078

Elastic Search Getting Version Conflict even while updating document sequentially using update_by_query

I am trying to update a document's nested type field using update_by_query. I am using the following script query:

POST test/_update_by_query
{
  "script": {
    "source": "ctx._source.address = params.address",
    "params": {
              "address": [{"city":"Mumbai"}]
    }
  },
  "query": {
    "bool": {
      "must": [
        {
                        "term": {
                            "uid": "b123"
                        }
                    }
      ]
    }
  }
}

But I am getting the following error:

version conflict, required seqNo [607], primary term [16]. current document has seqNo [608] and primary term [16]

What is the reason for this issue and How I can fix this? Instead of _update_by_query can I use any other query here? Please help me here

Upvotes: 7

Views: 13573

Answers (4)

Wesley Cheek
Wesley Cheek

Reputation: 1696

I'm using CustomResource from the AWS CDK to add and update roles on an OpenSearch (AWS ElasticSearch) domain.

Everything is inspired from here: docs that describe request signing

I've been running into this version conflict issue, so I implemented refresh=true, retry_on_conflict=3 from the other answers. Here's what my python CustomResource request sent through a NodeJS lambda function looks like:

from aws_cdk import custom_resources as cr

...

es_requests_provider = cr.Provider(
    scope=self,
    id="es_requests_provider",
    on_event_handler=self.es_requests_func,  # Lambda function that actually talks to the domain
)
modify_role = CustomResource(
    scope=self,
    id="es_requests_resource_modifyRole",
    service_token=es_requests_provider.service_token,
    properties={
        "requests": [
            {
                "method": "PUT",
                "retry_on_conflict": "3",
                "refresh": "true",
                "path": f"_opendistro/_security/api/rolesmapping/lambda_write_access",
                "body": {
                    "backend_roles": [
                        addToSearch_lambda.role.role_arn,  # I'm adding a role for another lambda function to add stuff to the domain
                    ],
                    "hosts": [],
                    "users": [],
                },
            },
        ]
    },
)

Hope this helps someone.

Upvotes: 0

Rafiq
Rafiq

Reputation: 11465

I had the same problem, needed two queries to execute one after another on the same index using refresh=true solved my problem

https://www.elastic.co/guide/en/elasticsearch/client/javascript-api/current/update_by_query_examples.html

        await elasticWrapper.client.updateByQuery({
          index: ElasticIndex.Customer,
          refresh: true,
          body: {
            query: {
              bool: {
                must: [
                  {
                    match: {
                      'user.id': id,
                    },
                  },
                ],
              },
            },
            script: {
              source: `ctx._source.user=params.user`,
              lang: 'painless',
              params: {
                user: { id, name: fullName, thumbnail },
              },
            },
          },
        });

        await elasticWrapper.client.updateByQuery({
          index: ElasticIndex.Customer,
          refresh: true,
          body: {
            query: {
              nested: {
                path: 'tasks',
                query: {
                  bool: {
                    must: [
                      {
                        exists: {
                          field: 'tasks.assignee',
                        },
                      },
                      {
                        term: {
                          'tasks.assignee.id': id,
                        },
                      },
                    ],
                  },
                },
              },
            },
            script: {
              source: `for (int i = 0; i < ctx._source.tasks.size();i++){
                if(ctx._source.tasks[i].assignee.id == params.id){
                  ctx._source.tasks[i].assignee.thumbnail = params.thumbnail
                }
               }`,
              lang: 'painless',
              params: {
                id,
                thumbnail,
              },
            },
          },
        });
      }

Upvotes: 2

Prashant
Prashant

Reputation: 51

You can use refresh=True in the argument of your query.

Upvotes: 5

Val
Val

Reputation: 217314

Update by query takes a snapshot of the data and then updates each matching document. This error means that the document has been updated by another process after your update by query call started running...

You can choose to ignore those conflicts issues, by doing this:

POST test/_update_by_query?conflicts=proceed

In the response, you're going to have an indication of how many documents were in conflict and you can run the update by query again to pick them up if desired.

Update:

If you need to update only a single document and you know its ID, then you don't need to use update by query, but simply the update endpoint. The big advantage is that the update endpoint has a parameter called retry_on_conflict which will retry the operation in case of conflicts, so that you can be sure that the document is eventually updated when the call returns:

POST test/_doc/123/_update?retry_on_conflict=3
{
  "doc": {
    "address": [{"city":"Mumbai"}]
  }
}

Upvotes: 8

Related Questions