Anmol Bhatia
Anmol Bhatia

Reputation: 326

Elasticsearch: Bulk update for nested object

I have my document structure like this:

{
    "documentID": 123,
    "originalFilename": "Build a Better Post.pdf",
    "modDate": "2017-11-16T18:22:54.48",
    "documentType": "pdf",
    "keySystem": "web",
    "title": "Build a Better Post",
    "createPreview": false,
    "uploadedBy": "DA5208B3-2198-44C6-8256-0AEBC4DD1588",
    "streamItemData": {
        "itemID": 800,
        "author": {
            "employeeID": 9,
            "authorName": {
                "firstName": "Joseph",
                "preferredName": "Joe",
                "lastName": "Smith"
            },
            "title": "manager"
        }
    }
}

There are about millions of documents in my elasticsearch. One author object can be present in thousands of documents basically there is 1 to many relationship there.

Whenever the nested object author is updated, say title is updated i want to update all my documents which contain this author which could be millions of documents. Is there any elastic search query with which i can achieve this. I understand that there should be a bulk update process which should handle this, but is there any approach where i don't have to query all the documents which contain this object and then update those one by one.

Upvotes: 2

Views: 763

Answers (1)

Val
Val

Reputation: 217564

The _update_by_query endpoint is what you're looking for.

The command below will identify all documents for the author with employeeID: 9 (you can have whatever condition you want), and then it will replace the author fields with the ones in the script parameters:

POST your-index/_update_by_query?wait_for_completion=false&slices=auto&conflicts=proceed
{
  "script": {
    "source": "ctx._source.streamItemData.author.putAll(params)",
    "lang": "painless",
    "params": {
        "authorName": {
            "firstName": "Joseph",
            "preferredName": "Joe",
            "lastName": "Smith"
        },
        "title": "manager"
    }
  },
  "query": {
    "term": {
      "streamItemData.author.employeeID": "9"
    }
  }
}

Since you might be willing to update millions of documents, I've added wait_for_completion=false to the URL so that the update runs asynchronously. You can inspect the task while it's running using the Task management API

Upvotes: 3

Related Questions