Reputation: 326
I have my document structure like this:
{
"documentID": 123,
"originalFilename": "Build a Better Post.pdf",
"modDate": "2017-11-16T18:22:54.48",
"documentType": "pdf",
"keySystem": "web",
"title": "Build a Better Post",
"createPreview": false,
"uploadedBy": "DA5208B3-2198-44C6-8256-0AEBC4DD1588",
"streamItemData": {
"itemID": 800,
"author": {
"employeeID": 9,
"authorName": {
"firstName": "Joseph",
"preferredName": "Joe",
"lastName": "Smith"
},
"title": "manager"
}
}
}
There are about millions of documents in my elasticsearch. One author object can be present in thousands of documents basically there is 1 to many relationship there.
Whenever the nested object author is updated, say title is updated i want to update all my documents which contain this author which could be millions of documents. Is there any elastic search query with which i can achieve this. I understand that there should be a bulk update process which should handle this, but is there any approach where i don't have to query all the documents which contain this object and then update those one by one.
Upvotes: 2
Views: 763
Reputation: 217564
The _update_by_query
endpoint is what you're looking for.
The command below will identify all documents for the author with employeeID: 9
(you can have whatever condition you want), and then it will replace the author
fields with the ones in the script parameters:
POST your-index/_update_by_query?wait_for_completion=false&slices=auto&conflicts=proceed
{
"script": {
"source": "ctx._source.streamItemData.author.putAll(params)",
"lang": "painless",
"params": {
"authorName": {
"firstName": "Joseph",
"preferredName": "Joe",
"lastName": "Smith"
},
"title": "manager"
}
},
"query": {
"term": {
"streamItemData.author.employeeID": "9"
}
}
}
Since you might be willing to update millions of documents, I've added wait_for_completion=false
to the URL so that the update runs asynchronously. You can inspect the task while it's running using the Task management API
Upvotes: 3